home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-11-13 | 51.1 KB | 1,201 lines |
- ============================================================================
- || ||
- || MAT -- p.n.; poss. abbrev. of "Match"; also "Matte" [Motion ||
- || Picture Arts]: means of cutting, inserting and superposing ||
- || disparate items. ||
- || ||
- || ------------------- ||
- || ||
- || This program provides a flexible string-searching, pattern-matching ||
- || and substitution mechanism for both text and filenames. Searching ||
- || for a string in a file is fast (it can be three times as fast as the ||
- || AmigaDOS 'SEARCH'). The (much slower) matching scheme is an extended ||
- || version of the standard AmigaDOS pattern-matching convention, with ||
- || the added features of negation and "slicing" of matched strings. It ||
- || will probably be most useful within command (or ARexx) scripts to ||
- || extend the operations possible with AmigaDOS. ||
- || ||
- || ||
- || ||
- || * Searches for strings or patterns within text files. ||
- || ||
- || * Rearranges text within matched lines to ||
- || create new files. ||
- || ||
- || * Tags and labels can be added to the output stream ||
- || at desired points ||
- || ||
- || * Searches directories for matching file names. ||
- || ||
- || * Creates Command Script Files by inserting the whole or ||
- || parts of matched filenames into text templates. ||
- || ||
- || * Control commands for the program may be read from a ||
- || script file as well as the command line. ||
- || ||
- || ||
- || -- Copyright 1990 Peter J. Goodeve -- ||
- ============================================================================
-
-
- -- by Pete Goodeve --
-
- July 1990
-
- Overview
- ________
-
-
- This program is intended to fill a number of pattern matching needs
- not covered by other facilities. It is very flexible, and consequently
- has a number of features you may never use, but it is well suited to those
- everyday jobs, too.
-
- It handles both string searches (in which the string will be found anywhere
- it occurs on the lines being scanned), and pattern matches that compare
- each whole line against a pattern. String searches can be at least three
- times as fast as if you used the AmigaDOS 'SEARCH' (disk access time may
- well limit the improvement, though). Pattern matches are much slower of
- course, but provide much more precise control of the operation. You can
- also often get the best of both by "filtering" the text first with a string
- search and applying the pattern only to those lines that contain the filter
- string.
-
- Patterns are in the usual AmigaDOS format (as used by LIST and so on), with
- a number of extensions. You can specify "negative match" segments, whose
- appearance will cause a match to fail, and you can -- most importantly --
- indicate "slice points" that mark pieces of the matched string to be
- rearranged in the output.
-
-
- To illustrate:
-
- MAT S #include mysrc.c
-
- would print out all lines that contain the string "#include" in the
- file "mysrc.c". Note that a) the keyword argument "S" signals that
- this is a string search (you can also use the full keyword "SEARCH"
- -- upper or lower case); b) the string can be anywhere in the line
- (though this particular one will usually be at the beginning);
- and c) the character '#' has no special meaning in a string search
- (as opposed to being a special character in a pattern).
-
- MAT S #include #?.c
-
- would do the same thing for all files in the current directory ending
- in ".c". You can have several file specifications in one command
- if you like:
-
- MAT S #include test/#?.c WORK:#?/#?.c
-
- would search all such files in the subdirectory "test" and all
- (immediate) subdirectories of assigned device "WORK:".
-
-
- The preceding examples simply display all lines that fit the criterion,
- but you can add a "template" to specify exactly what should be output.
-
- MAT S #include T "^F line ^N: ^O" #?.c
-
- Here, the addition of the template, with its preceding keyword "T"
- (or full word "TEMPLATE"), specifies that each matching line should
- be displayed in this fashion:
-
- "mysrc.c line 5: #include <stdio.h>"
-
- The marker "^F" (two characters -- the caret '^' followed by upper
- case 'F' -- NOT a "control-F") indicates that the current file name
- should appear here; "^N" represents the current line number, and
- "^O" ("Oh", not "zero") indicates the original current line.
-
- There are a number of other template markers you can use, especially
- when you are matching against a "slicing pattern" rather than doing a
- string search. You can also specify a template for all the lines that
- FAIL to match.
-
- In addition to templates, which apply to every line, you can supply
- a "Tag", which applies once per file -- immediately before the first
- matched line if there is one, otherwise when the whole file has been
- read (giving an alternative message). The format is the same as a
- template.
-
- MAT S #include TAG "^F:^| No matches found in ^F" #?.c
-
- will display the filename (once) before all lines matched in that
- file, or, if there were no matches, report that fact. The "^|"
- pair separates the 'success' and 'failure' portions of the tag
- (or template).
-
-
- Pattern matching can be done on the lines of a file in much the same way.
-
- MAT "#?printf#?,# X,#?" mysrc.c
-
- looks for any lines that do a "printf" on variable "X" (with
- arbitrary characters intervening. (Notice that no keyword is
- needed here, as this is the default case, but if you prefer you
- can use the key "P" or "PAT"; for variant forms, or where there
- is ambiguity, a keyword may be required.)
-
- You can speed things up by screening the lines first:
-
- MAT S printf P "#?printf#?,# X,#?" mysrc.c
-
- only does the full match on lines that contain printf. The keyword
- "P" is needed here to make your intention clear.
-
- You can add "slice-marks" to a pattern, and rearrange the resulting pieces
- with a template. The slice-mark is a single caret character placed at the
- desired point. (When you use slice marks, you must also use a template.)
-
- MAT #?^(word|another)^#? "^1: ^0----^2" myfile
-
- Here slice marks are placed either side of the bracketed alternation
- in the pattern. In the template,"^0" is the segment of the line matched
- by the part of the pattern before the first slice mark, "^1" is the
- segment between the two marks (i.e. "word" or "another", depending on
- which was matched), and "^2" is the rest of the line. Thus, if the
- input line was
-
- "this line contains the word we are looking for"
-
- the output would be
-
- "word: this line contains the ---- we are looking for"
-
-
- Up to this point, the examples have focussed on scanning text files,
- but the same mechanisms can equally well examine directories. For the
- simple jobs, of course, one would just do a DIR or LIST, or perhaps a
- LIST...LFORMAT.., but Mat can do a few things that the others can't.
- For instance, the ability to exclude files by means of a "negative
- match" can be very useful. Also the template is rather more flexible
- than that of "LIST..LFORMAT (and pre-dates it by about three years!),
- among other things allowing specifiers for directory-path separate
- from the filename.
-
- For example, for taking quick looks at a bunch of files, I have a
- script that does something like this (although in fact it also
- pops up a new full-screen window to do it in, and uses a PIPE:):
-
- .K filepat/A
- MAT >T:_mm T "more ^P" F <filepat>
- execute T:_mm
-
- "^P" represents a complete pathname, and the keyword "F" (or "FILES")
- says that the file names (specified in the filepat argument) are to be
- used themselves, rather than be scanned.
-
- The preceding example is similar to the "DPAT" script that comes with
- 1.3, but you can do other things, too. Suppose (and I have actually
- been in this situation) that I have a bunch of files that I have
- previously renamed to "myfile.c_0", "myfile.o_0", "myfile_0" and so
- on, which I now want to restore to their former names by chopping off
- th "_0". I have a script that calls Mat in a similar way to the
- above, and does just what I want:
-
- REN myfile#?^_0 AS ^0
-
- The first argument to the script is a pattern to select the files
- to be renamed, and the second is the template specification of the
- new names. Notice the "slice-mark" '^' in the pattern, and the
- marker "^0" (meaning the part to the left of the slice) in the
- template. Without bothering to detail the script, the core is a
- Mat command line that creates appropriate "RENAME" commands to be
- executed.
-
-
- In most cases you will want to have Mat embedded in script files, as this
- can drastically simplify the otherwise sometimes complex syntax. Mat, by
- the way, returns a "WARN" value (5) to AmigaDOS if no match was found --
- useful for conditional execution in scripts.
-
- This has only been a brief sketch of the various modes and so on that the
- program has. It is worth bearing in mind though that it has a number of
- features -- including the possibility of accepting commands from a script
- file or pipe -- that add to its usefulness in things like installation
- scripts. These features also let it work well with ARexx (even though it
- doesn't itself have an ARexx port).
-
-
- The following sections go into more detail about Mat's capabilities.
- First the command line formats and operating modes are described, then
- the keywords are listed in detail. Following this is a full description
- of AmigaDOS-type patterns in general, and the extensions used in Mat.
- Finally comes a discussion of Templates and File Specifiers.
-
- + + + + +
-
- Installation and Operation
- __________________________
-
- Mat is like any other CLI/Shell command, and is invoked by a command line
- with suitable arguments. For normal use it should be available in the
- C: directory (or at least on the path of the current shell). For added
- speed (under AmigaDOS 1.3 or later) it may be made Resident.
-
-
- Command Line Arguments
- ______________________
-
- There are a number of basic command forms corresponding to the various
- operations Mat can perform. Each of these in turn can have variants,
- and they can often be combined to achieve a particular result. There
- are several common basic components, however, which first need clear
- understanding.
-
- In the order they normally appear in the command line, from left to
- right, they are: <Search-String>, <Pattern>, <Template>, and <Name-Spec>.
- Only the last of these is invariably present; the other three MAY all
- appear, but quite usually don't. Keywords precede each where necessary
- to identify them, and other keywords may be interspersed to give
- further control.
-
- <Search-String>: A literal string of characters that will be searched
- for in a text file. Any line that contains the string
- as a substring is considered to match. No characters in the string
- will get special treatment -- there are no "wild-cards" or such.
- Matches must be exact, with upper and lower case distinct, unless the
- NOCASE keyword is used. Search-Strings are only used in text searches,
- not in filename or key matching. A Search-String must be identified by
- the required keyword "SEARCH" or its abbreviation "S".
-
- <Pattern>: An AmigaDOS-type pattern specifier string, usually
- containing "pattern-structuring" characters with
- special meaning. (See the section on "Pattern Matching" later.) It is
- always matched against the whole target of concern (a line of text,
- filename, or argument). Again, case is relevant unless NOCASE is used.
- If a pattern is the first argument on the command line it does not need
- a keyword (as it is the default) but if it is elsewhere -- following a
- Search-String for example -- it must be preceded by "PAT" or "P". (You
- would also have use the keyword in the remotely possible case that the
- pattern was itself identical to another keyword.)
-
- <Template>: A string, usually containing "template markers",
- that directs the program as to the output it is to
- generate for each matching -- and possibly non-matching item. (See the
- section on "Templates".) If it is absent, the output depends on the
- mode: during a text search each matching target is simply output in its
- entirety; when matching filenames or keys there is no output at all
- without a template. If a Template string immediately follows an
- initial Pattern argument that REQUIRES a template (i.e. it has
- slice-marks), no keyword is needed, but it is probably best to use one
- anyway; it is of course "TEMPLATE" or just "T".
-
- <Name-Spec>: Either a simple string or a pattern that defines
- the entities to be examined. There may be several on
- one command line. In the Text-Searching modes, it indicates a file or
- group of files to be scanned (referred to as a <File-Spec> below). For
- Directory Searching, it will normally be the actual search pattern to
- match. It may also represent a "key" (such as a command script
- argument) that is to be tested directly against a Pattern. There is no
- specific associated keyword, but preceding mode-selection arguments may
- determine the meaning. For searching text files, no keyword is usually
- needed, as this is the default, but you may occasionally have to supply
- one to prevent ambiguity (a file name the same as a keyword, for
- example); if you do, the word is "FROM". To select Directory
- Searching, you use one of: "FILES" (or "F") -- to search only for files
- (not directories) --; "DIRS" ("D") -- to retrieve only directory names
- that match --; or "NAMES" ("N") -- to match both. To match the
- arguments themselves use "KEY" ("K").
-
- Not all keywords have abbreviations by the way -- only some of the
- frequent, non-ambiguous ones. All may be given in upper or lower case,
- however. If any of the above items have embedded spaces, they must of
- course be enclosed in quotes (which will be stripped off before they
- are used).
-
-
-
- Operating Modes
- _______________
-
- Text Search:
-
- The default function of Mat is to search text files for lines that
- match a pattern:
-
- MAT <Pattern> <Name-Spec>...
-
- All lines that match are written to standard output (the screen,
- normally). For details on patterns, including the Mat extensions and
- the possibilities of "slicing" the matched string for use by a
- template, see the section on "Pattern Matching".
-
-
- To perform a string search rather than a pattern match, the form is:
-
- MAT SEARCH <Search-String> <Name-Spec>...
- or
- MAT S <Search-String> <Name-Spec>...
-
-
- You can also combine string search and pattern match:
-
- MAT S <Search-String> P <Pattern> <Name-Spec>...
-
- If both a Search-String and a Pattern are specified in the command
- line, each line of the text file will be tested for the presence of the
- string first, and only if that succeeds will the pattern be applied.
- In situations where this can be done, pattern matching is hardly slower
- than the string search alone.
-
-
- With the above forms, you can control the text output with a template
- (see the section on "Templates" for full details):
-
- MAT <Pattern> TEMPLATE <Template> <Name-Spec>...
- MAT <Pattern> T <Template> <Name-Spec>...
- MAT S <Search-String> T <Template> <Name-Spec>...
-
- In the particular case where a pattern has slice marks (see "Pattern
- Matching"), and therefore requires a template, the keyword is optional
- -- provided that the template argument is in the correct place:
-
- MAT <Slice-Pattern> <Template> <Name-Spec>...
-
-
- Directory Search:
-
- When searching directories for matching names, a template is normally
- expected:
-
- MAT NAMES <Template> <Name-Spec>...
- MAT NAMES T <Template> <Name-Spec>...
- MAT N T <Template> <Name-Spec>...
-
- The "TEMPLATE" keyword is optional with this precise form, but required
- for others. For example, a separate pattern (in addition to the one in
- the Name-Spec) is sometimes useful (to further subdivide the set of
- found files for template formatting):
-
- MAT <Pattern> T <Template> NAMES <Name-Spec>...
-
- The template itself however is not optional. If you really want no
- template (so nothing will be sent to the output for matches, you must
- specify it as a null string; for example:
-
- MAT NAMES "" <Name-Spec>...
-
- The NAMES form locates all matching entries, both file and directory.
- The equivalent forms to find just one or the other are:
-
- MAT FILES <Template> <Name-Spec>...
- MAT F <Template> <Name-Spec>...
- MAT DIRS <Template> <Name-Spec>...
- MAT D <Template> <Name-Spec>...
-
-
- Other Modes:
-
- You can check the arguments in the command line -- probably themselves
- supplied as arguments to an execute script -- against a pattern:
-
- MAT <Pattern> KEY <Name-Spec>...
- MAT <Pattern> K <Name-Spec>...
-
- Without a template, the KEY form simply checks for a match, and returns
- WARN to AmigaDOS if it doesn't find one. No output is generated. Add
- a template to get the desired output:
-
- MAT <Pattern> K T <Template> <Name-Spec>...
-
- A final mode simply writes all files that match the Name-Spec to
- standard output in sequence. No processing of the files is done --
- they needn't even be text:
-
- MAT JOIN <Name-Spec>...
-
- You can however supply a Tag-template (see sections on "Keywords" and
- "Templates") that will be output for each file. The 'success' portion
- will be written BEFORE the file, the 'fail' part AFTER it; either part
- may be omitted. Labels are also valid in a JOIN command line.
-
-
-
- Value returned to the CLI:
- _________________________
-
- When Mat returns to the CLI or Shell, it passes back a value of zero if
- it has found at least one match. If it has found no matches at all it
- returns a "WARN" value of 5. This happens in all modes, and can be
- tested within a command script to see if the intended operation has
- been successful. If you should just want to know if a match exists,
- without needing to see any output, you can simply redirect this to
- NIL:.
-
- If Mat encounters an error which prevents it from continuing, like an
- incorrectly formed pattern, it will return at once with an error code
- of 20.
-
-
- Keywords:
- ________
-
- Aside from the mode selecting keywords discussed above, there are
- a number used to control other features. Some of these take arguments
- ("SEARCH", "PAT", and "TEMPLATE" being examples we've already met).
-
- In general, keywords can be placed either at the beginning of the line,
- or at any appropriate later point, as long as they don't separate a
- keyword and its argument. The exact effect may depend on where on the
- command line they are placed; in many situations you could have several
- interspersed along the line. Mat always processes the command arguments
- in sequence, from left to right (unlike the position independent
- keywords of AmigaDOS commands). All keywords may be in upper or lower
- case. When so specified, they may be abbreviated to their first letter.
-
- To summarize the mode selection keywords already mentioned:
-
- FILES -- searches for matching filenames
- DIRS -- searches for matching directory names
- NAMES -- searches for both files and directories
- KEY -- checks the command line for matches
- JOIN -- sequentially copies all matching files to the output
-
- All the above may be abbreviated to their first letter.
- You may change modes within a command line if for some reason you need
- to. The change takes place at the point the keyword is encountered.
-
- These component specifiers have also been mentioned:
-
- FROM <Name-Spec> -- sets text search mode and specifies
- that the following argument is a Name-Spec
- (No abbreviation). Rarely needed.
- SEARCH <Search-String>
- -- declares its argument to be a string
- for quick scanning of following text files
- (abbreviation "S").
- PAT <Pattern> -- specifies that its following argument
- is a pattern (abbreviation "P").
- TEMPLATE <Template> -- specifies that its following argument
- is a template (abbreviation "T"). The
- template string may be null ("") to cancel
- any previous template, or satisfy the
- Directory Search's requirement for one.
-
- Only one Search-String, one Pattern and one Template may be active
- at any time. You can change any of them at any point in the command
- line, though, replacing the old ones.
-
- There are two other keywords that also take a template form argument:
-
- TAG <Template> -- defines a "tag" (for text file modes only)
- that will be output at most once for each file.
- The 'success' portion of the template (see the "Templates" section)
- will be output immediately before the first match found in the file;
- the 'failure' part will be output if no match is found by the end
- of the file. Note that this applies even if non-matching lines
- are being output (by a 'fail' part of the line template)! Non
- matching lines before the first match will also appear BEFORE
- the tag. Some of the selectors that are available for a line
- template (such as slices) are not so appropriate within a tag
- template; they will be ignored if used. Declaring a new tag
- replaces any previous one; only one may be active at a time.
-
- LABEL <Template> -- outputs a "label" at the point it is
- encountered in the command line. There
- are the same restrictions on available template selectors as for
- tags. If no "fail" part is supplied, the label will be output
- unconditionally; if you do supply a fail part (after "^|"), the
- "success" part preceding the divider will only be output if there
- has been at least one match to that point, otherwise the fail
- part will be output.
-
- No abbreviations are provided for either of the previous or any of the
- remaining keywords.
-
- The following subsidiary mode controls can be put anywhere appropriate
- on the command line:
-
- NOCASE -- causes all subsequent searches to ignore the case of
- pattern and text characters. It can be put anywhere in
- the command line subject to the above restrictions; file specifiers
- appearing before it will not be affected.
-
- CASE -- cancels the effect of a previous NOCASE.
-
- FIRST -- is only appropriate in text matching modes. It
- causes the search of each file after it on the command
- line to terminate when the first match is found. It is useful when
- you just want to determine which files contain a pattern, rather
- than listing every occurrence. It is compatible with templates and
- other options.
-
- ALL -- reverses the effect of FIRST if you should need to
- do so within a command line.
-
- NOLINES -- prevents the usual newline character being output
- after each match. All subsequent matches will be shown
- on the same line unless the template dictates otherwise. Don't
- forget that you will usually want some sort of separator in the
- template, such as a space. It can be used in any mode.
-
- LINE -- reverses the effect of NOLINES if this has been
- given previously. (Apologies for the plural/singular
- disparity, but it isn't quite the inverse.) It also inserts a
- newline into the output at that point; you can use it just for this
- if you want an extra blank line between file specifiers.
-
- ZERO -- resets the item and match counters (available for
- templates) to zero. Has no other effect on the state.
-
- RESET -- sets the system back to the initialized state:
- pattern, template, and tag are cleared, and the
- counters are zeroed. Only the internal success flag (that
- controls the value returned to AmigaDOS) is left unchanged.
- This option is intended for command script use (see below);
- you are not likely to need it on a command line.
-
-
- Two keywords can be used to control input and output channels. Each
- takes a single Filename argument (NOT a pattern!):
-
- OUT <File> -- diverts future output to the named file (or
- device). Any existing file of that name is deleted.
- Any previously selected destination will be closed (except
- standard output of course).
-
- OUT - -- (a single dash as the argument) closes any
- currently selected output file and restores standard
- output (the screen, unless AmigaDOS redirection has also been
- used).
-
- WITH <File> -- reads control commands from <File> instead of
- the command line. All keywords and argument types
- are valid in a script, but NOTHING is reset at the end of a line.
- (Don't break a line between a keyword and its argument, though!)
- The RESET keyword must be included when you need to clear the
- decks. When the end of the script is reached, control returns
- to any further arguments on the command line. As with a CLI
- command line, anything after a semicolon on a line is ignored:
- you may comment your scripts.
-
-
- + + + + +
-
-
- Pattern Matching
- ________________
-
-
- The pattern matching algorithm used by Mat is an extension of the
- standard file pattern matching scheme used by AmigaDOS. Many people may
- not appreciate how general and flexible the method is. It is many times
- more capable than the simple "wild-card" matching available on most
- personal computers. There are some things that the standard algorithm
- doesn't have which would often be useful, and I have done my best to supply
- some of these in this extended version.
-
- The discussion that follows may be a fuller exposition of how to use
- pattern matching than is available from other sources. If you leave out
- references to the "universal-match" character "*", "negation matches", and
- "slicing", everything discussed applies just as well to standard AmigaDOS
- patterns, which can be used in commands like LIST, DELETE, and COPY.
-
- A pattern is a text string constructed from "plain characters" and
- "special characters". It represents a set (possibly a large set) of text
- strings that will match it. Remember that it always matches complete
- strings; this is not the same as a simple text search, where a match is
- signalled if the search string is found anywhere within the source text.
- The string being matched by the pattern is always "bounded" in some way,
- either because it stands alone -- like a file name -- or because, say, it
- is a complete line of text. The newline character at the end is not
- usually available to the matching process.
-
- If a pattern argument in a command line contains spaces, it must of
- course be enclosed in quotes. There is no way of including quotes in a
- pattern which is itself enclosed in quotes, unfortunately, (because of the
- way C handles argument strings).
-
- The syntax of the pattern structure is such that complex patterns can
- be built from simple ones. Broadly speaking, patterns may be chained end
- to end so that successive segments of a complete target string may be
- matched by successive segments of the pattern. In addition, each pattern
- segment can specify "alternatives": if any of these match, the whole
- segment matches.
-
- Plain Characters:
-
- The simplest pattern is a string of plain characters. This will only
- match a target string consisting of exactly the same characters in the
- same order, which is obviously of limited usefulness. The only case
- where you are likely to want this is when getting a particular file
- name, and the program is smart enough to go directly to the file in
- this case rather than doing a search.
-
-
- Special Characters:
-
- To build more general patterns we need the special characters. These do
- not represent themselves (unless special action is taken): they are
- instead structural elements that form the structure of the patterns we
- desire. Using them we can build patterns -- or subpatterns -- that will
- match, say, any single character, any five characters, any arbitrary
- string, or a string that is one of several possible specific
- alternatives. We can then put such subpatterns together to end up with
- a complete pattern that will match all the various possibilities we are
- looking for and no others. The possibilities should become clearer as
- we get to specific examples.
-
-
- The seven special characters used in AmigaDOS file matching are:
-
- ' ? | ( ) # and %
-
- To these Mat adds two more:
-
- ~ and ^
-
- We'll look at them briefly in order, before we get into a fuller
- exploration:
-
- " ' " makes the character following it into a plain character.
- " ? " matches ANY single character.
- " | " separates alternative patterns.
- " ( " and " ) " enclose patterns used in building larger ones.
- " # " causes a match to any number of repetitions of the pattern
- it precedes.
- " % " matches the null string when syntactically necessary.
-
- " ~ " is one way (of two) of sprecifying negation.
- " ^ " slices a matched string into segments.
-
-
- Quoting Characters:
-
- The single quote (" ' ") is used to turn any special character
- immediately following it into a plain character. Thus to match against
- an actual question mark in a target text you would include the pair
- " '? " in the pattern. And of course it can quote itself.
-
-
- Matching Any Character:
-
- The question mark matches ANY single character. Thus:
-
- ???
-
- matches "abc", "xyz", and so on, but not "ab" or "abcd".
-
-
- Matching Alternatives:
-
- The vertical bar (" | ") separates "alternatives". If any of a set of
- patterns separated by bars matches the target, the match is successful.
- For example:
-
- abc|def|qwertyuiop
-
- would match any of those three strings, but no others.
- The pattern
-
- abc|x?z
-
- would match "abc" or "x" and "z" separated by any single character.
-
-
- Building Patterns from Others:
-
- The left and right parentheses can be used to enclose a pattern that
- you want to match as a unit when it is part of a larger pattern. As one
- example we could look for any two characters followed by "abc" or "def"
- with the pattern:
-
- ??(abc|def)
-
- Combine two or more patterns in sequence this way:
-
- (abc|def)(xxx|yyy)
-
- This will match "abcxxx", "abcyyy", "defxxx", and "defyyy".
-
- Patterns can be nested as far as you like with parentheses:
-
- a(bc|??(xx|yy))d
-
- will match "abcd", or any six-letter group beginning with "a" and
- ending in "xxd" or "yyd".
-
- Redundant parentheses do no harm. They may be useful to distinguish
- patterns from other constructs.
-
-
- Pattern Repetition:
-
- The " # " character is always followed by a (sub)pattern. It will match
- ANY number of (exact) repetitions of that pattern (INLUDING zero). The
- pattern may be a single letter, but if it isn't it must be enclosed in
- parentheses. Thus:
-
- #(ab)
-
- matches "ab", "abababab", or simply an empty string. It does not match
- "ababa".
-
- Ther pattern to be repeated may be any legal pattern, including more
- repetition constructs if you want:
-
- #(ab|?x|#(xy)z)
-
- will match such strings as "abab", "zxab", "qxxyxyxyxyzxyab", and so
- on. It will NOT match "abxy".
-
-
- Matching the Empty String:
-
- The " % " character is used where you have to specify an empty ("null")
- string -- normally as one of a number of alternatives. The
- construction
-
- (|abc)
-
- is not legal; instead you must use:
-
- (%|abc)
-
- which will match either "abc" or the null string.
-
-
-
- Negated Matching:
-
- Mat extends the basic pattern matching syntax by allowing you to
- specify patterns that if matched will cause the overall match to fail.
- If a negated segment is included in a pattern, and the target string
- has ANY POSSIBLE match of the whole pattern that includes that segment,
- the match cannot succeed. There are restrictions on negation patterns
- not shared by the structures we've talked about up to now; in
- particular they can't be nested -- you can't negate a negation --
- although they can be inserted at any level in the pattern.
-
- There are two ways of specifying negated patterns. The first will
- match ANY string UNLESS it exactly matches the pattern; it is
- constructed by prefixing the pattern by the tilde (" ~ "):
-
- ab~(cd)e
-
- will not match "abcde", but will match any other string that begins
- with "ab" and ends with "e", such as "abxxxe", "abe", "abce", etc..
-
- The second form is a "negated alternative", indicated by two adjacent
- vertical bars (" || "). This is used when, rather than matching ANY
- string that is not the negated one, you have a set of patterns you want
- to match UNLESS the negated part is also matched. Thus:
-
- a(b?d|?c?||bcd)
-
- will match four character strings such as "abxd", "accc", "abcx", as
- long as the whole string is not "abcd".
-
- You can have more than one negated segment, as long as one does not
- appear inside another. Thus the following sort of thing is possible
- (whether it's also useful though...?):
-
- a~bc~(de)(???||fgh||xyz)
-
- Remember that this will be forced to fail if there is any possible
- match that includes a negated section. Thus these will succeed:
-
- acxxx
- abbbcddeabc
- acdexy
-
- and these will fail:
-
- abcxxx
- abbcdexxx
- aczxsdefrgthcjxsxcxyz
-
- To stress it once again, a negated match is "aggressive": if there
- is ANY possible match that includes a negated section, the whole
- match will fail.
-
-
- Slicing the Matched String:
-
- You can include "slice marks" (the caret -- " ^ ") in your pattern to
- select out pieces of the matched string that can be treated individually.
- Mat will arrange these "slices" in a manner specified by a template
- (next section), to generate desired output.
-
- Once again there is a restriction on the use of this character that
- does not apply to the others: only the first four of these marks
- encountered during a match will be recorded; any after this will be
- ignored. Note that this doesn't mean you can only include a maximum of
- four marks; if they are inside alternatives that don't match any part
- of the target string, the scan will never encounter them. You should
- be sure of what you are doing, though, if you don't want to be
- surprised by the program's choices. We'll return to this, and some
- other points you should note about the behaviour of slice marks, later.
-
- If there is more than one possible match of the pattern to the target,
- the slice will be made at the earliest possible point. Remember this
- especially when you have repetitions in your pattern.
-
- Examples:
-
- The pattern #?^x#?
- will cut abcdxyz
- into abcd xyz
-
- It will also cut abcxxxx
- into abc xxxx
-
- The pattern #?^x#?y^#?
- will cut abcxxxxyz
- into abc xxxxy z
-
- The pattern #?^#x^#?
- won't cut much of anything! (because "#x" also matches the null
- string.) The first two slices will simply always be empty, and
- slice three will contain the whole string.
-
- The pattern #?^(word|another)^#?
- will cut "here is another word for you"
- into "here is" "another" "word for you"
- (using quotes in this case to mark off the slices). Notice that
- the cuts are made around "another" rather than "word" because the
- earliest match is found.
-
- Slice marks within alternatives can be used, as noted above, but are
- tricky. Because of the way the marks are recorded internally, if two
- different alternatives containing them match, both marks will be
- reported but the position of one of them will be wrong (probably at the
- beginning of the string). So it is best to keep the slice marks
- outside of any alternation constructions (as shown in the last example
- above).
-
- Be careful using negation with slice marks. As noted above, any
- match with the negated section causes the whole match to fail: it
- does not try to find another match. Therefore you can't use a
- negation to force a slice mark to be in a different place. In
- general there are some limitations such as these which may prevent
- you from cutting up strings exactly the way you want (chopping out
- variable spaces can be very awkward for example).
-
-
- Templates
- _________
-
- The Templates Mat uses to generate output lines are basically simple
- text strings with "splice-markers" and other "selectors" that indicate
- where the pieces of the matched string and other items are to be inserted.
- The text segments of a template can be anything you want (except a newline
- -- there is a selector for this). A special marker can be used to divide
- the template string into "success" and "fail" halves; the "success" part
- controls the format of output lines for matches, while the "fail" part will
- be output for each input string that doesn't match. Output strings are
- always terminated with a newline, unless the NOLINES option is in effect.
-
- You can cancel an existing template by supplying a null string ( "" ). Any
- current Tag may be cancelled in the same way. The effect is slightly
- different in Directory Search mode: this normally EXPECTS a template to
- control the output (and you can't simply omit it), but you can defeat the
- requirement by supplying a null template, in which case matches will
- produce no output.
-
- Each marker (or selector -- the terms will be used interchangeably) is
- a character pair: the caret (" ^ ") followed by a selector character.
- Slices from the matched string are numbered -- "^0" to "^4" . Other items
- have identifying letters, such as "^N" for line number; the case of these
- letters is important (all are currently upper case because you are already
- holding down the shift key for the caret). The success/fail divider uses
- the vertical bar: "^|".
-
- Not all selectors are valid, or at least have exactly the same meaning,
- under all conditions. For example you can't use slices from a matched line
- in the "fail" section of a template because -- obviously -- there aren't
- any. Then, "line numbers" naturally only apply in text matching, but in
- file name matching mode the same selector (^N) keeps track of the number of
- files encountered. If you use a selector that is not valid it is simply
- skipped over. Of course you can use any selector more than once within a
- template.
-
- If a template argument in a command line contains spaces, it must of
- course be enclosed in quotes. As with a pattern, you can't include quotes
- in a template which is itself enclosed in quotes: use the "^Q" selector
- instead. You can include a caret character in the output string (if, for
- example you are generating a new Mat command in a script) by the pair "^^".
-
-
- Slice Selectors:
-
- As four slice marks are allowed in a pattern, there can be a maximum of
- five slices of the matched string. These are selected by "^0" for the
- piece from the beginning of the string to the first mark, "^1" for the
- piece between the first and second, up to "^4" for the remainder of the
- string beyond the fourth mark. If there are fewer than four slice
- marks, the slice associated with the final existing mark extends to the
- end of the string, and all higher-number pieces are empty. Thus if
- there are only two marks, "^2" covers the remainder of the string, and
- "^3" and "^4" are empty.
-
- For instance, if we use this pattern, with two slice marks:
-
- #?^word ^#?
-
- and this template -- which will omit slice 1:
-
- ^0^2
-
- to match and rearrange the string:
-
- "this word will be missing"
-
- we will end up with:
-
- "this will be missing"
-
- Slice marks are only appropriate to match-templates. They are
- ignored by 'fail' templates (see below) and in Tags and Labels.
-
-
- Line Number Selector:
-
- The marker "^N" placed in a template string will insert the current
- line Number within the file being scanned at that point into the output
- string. It can be used in both "success" and "fail" portions of a
- template. In tags and labels it will represent the lines read to that
- point. If used in Directory Search or Argument Scanning modes, it
- represents the total number of items scanned to that point; it is not
- automatically reset in these modes (use ZERO to do so)..
-
- So the pattern #?^(word|another)^#?
- and template ^N: ^1
- would generate something like 234: another
-
-
- Index Number Selector:
-
- The pair "^I" inserts an Index number representing a count of matches
- so far. The count is kept from the beginning of the program, and is not
- reset with a new file (use ZERO to do so). You may use it in the "fail"
- section of a template, also in tags and labels, but remember it will
- indicate the number of matches, not lines output.
-
- "^I" also works in Directory or Argument Search mode. If no pattern
- (aside from the file specifiers themselves) is supplied in these modes,
- it will have the same value as "^N", but you can also match the total
- set of files found against a Pattern argument, in which case "^I" will
- reflect these matches rather than total files.
-
-
- Original String Selector:
-
- The pair "^O" ("Oh", not "zero" -- I probably should have chosen a
- better one...) represents the unsliced Original string. It can be used
- in both the "success" and "fail" parts of a template. Thus, to simply
- put a line number in front of each matched line, you could use the
- template:
-
- ^N: ^O
-
- In File Matching mode, this selector is the same as "^F" (below); in
- Argument Matching it represents the current argument. It is a null
- string for tags and labels in all cases.
-
-
- Line Break Selector:
-
- The pair "^B" Breaks the output line at that point with a newline
- character. For instance, to output line number and slice-1 on one
- line, followed by the original string on a new line, you would use:
-
- ^N: ^1^B^O
-
-
- Quote Mark Selector:
-
- It is not usually possible to embed quote marks in template strings
- directly, so you can use the selector "^Q" to make them appear at that
- point in the output line.
-
- ^0 ^Q^1^Q ^2
-
-
- File Name Selector:
-
- "^F" selects the local name of the current File (i.e without any
- directory prefix), in both text and file name matching modes. (Only
- in Argument Match is it a null string.)
-
- For example, if you have a filename specifier argument (see later)
-
- :work#?/#?.txt
-
- which has found the file
-
- Work Disk:work_1/sample.txt
-
- the "^F" selector will insert
-
- sample.txt
-
- If the item referenced by the specifier is a Device (such as a PIPE:)
- rather than a file, "^F" will return the FULL name as specified
- (pattern specifiers will never find devices, anyway). Thus the
- specifier:
-
- PIPE:xyz
-
- simply returns:
-
- PIPE:xyz
-
- Warning: if the found object is a directory AND is a root device (e.g.
- "DF0" or "DF1:") the Filename is the full name of that disk but WITHOUT
- the terminating colon! (but the Full Pathname (below) is correct).
-
-
-
- Full Pathname Selector:
-
- "^P" represents the FULL pathname (from the parent device) of the
- file. Thus for the immediately preceding example, it would supply:
-
- Work Disk:work_1/sample.txt
-
-
- Directory Path Selector:
-
- "^D" in a template will insert the full path to the Directory of the
- current file. Thus for the above file, ^D would insert:
-
- Work Disk:work_1
-
- If the object found is a Device rather than a file, this is a null
- string (but the Full Pathname (^P) will be the same as the Filename
- (^F) -- see above). It will also be a null string if the found object
- is a directory and the specifier was in the form of a "Device" or
- "Parent" reference -- in other words like "xyz:" or "/"; in this case,
- the Filename ^F is NOT the full pathname -- just the usual simple name
- string.
-
-
- Literal Caret mark Selector:
-
- "^^" included in a template will generate a single caret mark in the
- output.
-
-
- Failure Template Marker:
-
- A simple template is only applied to strings which have been matched,
- and nothing is output when there isn't a match. You can split the
- template, however, into two subsections with the special success/fail
- division marker "^|". The section preceding this mark is applied for a
- successful match just like a simple template; the section following it
- is used if the match fails. In the "fail" section, any selectors
- desired can be used, except the five slices "^0" - "^4".
-
- A simple use would be to output all lines, whether or not they matched,
- but mark or rearrange the matched lines in some way. For example the
- following would output them all but put a flag and index number on
- each matched line (and corresponding blanks before an umatched one):
-
- MATCH[^I]> ^O^| ^O
-
-
- A tag may be split with the same "^|" divider, in which case the
- success part will be output immediately before the first match as
- usual, but the fail part will only appear if there is no match in
- the file. (In JOIN mode the behaviour is a little different: both
- parts will be output if present, 'success' before, and 'fail' after,
- each file.) In addition, the success part may be empty (the tag begins
- with the marker); in this particular case, ONLY the fail part will ever
- appear -- nothing, not even a newline, is output before the first
- match.
-
- This last does not apply to match-templates, where a newline will
- be written even when the success part is empty. If you really want no
- output on matches, add the NOLINES keyword and place "^B" markers
- suitably where you DO want newlines.
-
-
-
- File Specifiers
- _______________
-
- The arguments in the command line you supply to specify the files that
- Mat will examine are really just like those you might give to any AmigaDOS
- command, but there are one or two extra features.
-
- For text file searches you will probably most often want to specify a
- single file. You do this in the usual way with either the local name of a
- file in the same directory, or a path name that includes the chain of
- directories needed to reach that file in another. Unlike many other
- programs that allow pattern matching in filenames, by the way, Mat is
- perfectly happy with specific Device names (non filesystem), such as PIPEs
- or the Serial Device.
-
- In place of the simple file name, you can use a pattern to match a group of
- files in the same directory. Unlike other AmigaDOS commands this pattern
- can employ the extended matching features described above ("*", "~", and
- "||"). Slice marks can also be used where they are appropriate (see below).
- As is usual in AmigaDOS, case is always ignored when searching for matching
- filenames.
-
- You can also use patterns in the directory part of the specification,
- in just the same way as in the filename part. (Did you know that you can
- also do this in most AmigaDOS commands supporting patterns, such as
- DELETE?) All the directories matching that specification will be searched
- in turn. However, you cannot split a pattern across directories -- in other
- words, a pattern must not include a device or directory separator (":" or
- "/"). This means that a given pattern can only match directory names at a
- certain "level" in the file hierarchy of the disk. Also you cannot use a
- pattern in a device specifier -- these must be simple names. To search
- more than one level, or more than one device, you must have more specifier
- arguments.
-
- In File Name Search mode, if you don't supply any other pattern, you may
- put slice marks in the file name portion of the specifier. You cannot
- place them in the directory part. Except in this particular situation
- (no main pattern present), the program will ignore slices in the filename.
-
-
- Examples:
-
- These are valid file specifiers:
-
- myfile.txt
- my#?file(.txt|_bak)
- df1:work/myfile
- :work/myfile
- /work/myfile
- :(work|old)/my^#?
-
- These are not:
-
- df(0|1):#?/#? -- pattern in device part
- df1:/#(work/)myfile -- pattern includes directory separator
- :w^#?/my^#? -- slice mark in directory part
-
-
-
- + + + + +
-
- ========
-
-
-
- Distribution and Copyrights
- ___________________________
-
-
- Mat itself and this manual are copyright, but may be freely distributed
- without charge. Commercial use is prohibited without the express written
- permission of the author.
-
- No fee is asked for the non-commercial use of this program, but if one
- day you're feeling generous...
-
-
- Remarks and Suggestions to:
- Peter Goodeve
- 3012 Deakin Street #D
- Berkeley, Calif. 94705
-
- %%%%%%%%%%%%
-
-