home *** CD-ROM | disk | FTP | other *** search
-
-
-
- AGREP(l) AGREP(l)
- June 11, 1991
-
-
-
- NAME
- agrep - search a file for a string or regular expression, with
- approximate matching capabilities
-
- SYNOPSIS
- agrep [ -#cdehilnpsvwxDIS ] pattern [ filename... ]
-
- DESCRIPTION
- agrep searches the input filenames (standard input is the default) for
- records containing strings which either exactly or approximately match
- a pattern. A record is by default a line, but it can be defined
- differently using the -d option (see below). Normally, each record
- found is copied to the standard output. Approximate matching allows
- finding records that contain the pattern with several errors including
- substitutions, insertions, and deletions. For example, Massechusets
- matches Massachusetts with two errors (one substitution and one
- insertion). Running agrep -2 Massechusets foo outputs all lines in
- foo containing any string with distance at most 2 from Massechusets.
- agrep supports many kinds of queries including arbitrary wild cards,
- sets of patterns, and in general, arbitrary regular expressions. See
- PATTERNS below. It supports most of the options supported by the grep
- family plus several more (but it is not 100% compatible with grep).
- For more information on the algorithm used by agrep see Wu and Manber,
- "Fast Text Searching With Errors," Technical report #91-11, Department
- of Computer Science, University of Arizona, June 1991 (available by
- anonymous ftp from cs.arizona.edu inside agrep/agrep.tar as agrep.ps).
- As with the rest of the grep family, the characters `$', `^', `*',
- `[', `^', `|', `(', `)', `!', `;', and `\' can cause unexpected
- results when included in the pattern, as these characters are also
- meaningful to the shell. To avoid these problems, one should always
- enclose the entire pattern argument in single quotes, i.e., 'pattern'.
- Do not use double quotes ("). agrep works only on text (ascii) files.
- If the file is binary, for example, then agrep will generate an error
- message. Only one error message will be generated even if the file
- list contains many binary files. When agrep is applied to more than
- one input file, the name of the file is displayed preceding each line
- which matches the pattern. The filename is not displayed when
- processing a single file, so if you actually want the filename to
- appear, use /dev/null as a second file in the list.
-
- OPTIONS
- -# # is a non-negative integer (at most 8) specifying the maximum
- number of errors permitted in finding the approximate matches
- (defaults to zero). Generally, each insertion, deletion, or
- substitution counts as one error. It is possible to adjust the
- relative cost of insertions, deletions and substitutions (see -I
- -D and -S options).
-
- -c Display only the count of matching lines.
-
- -d 'delim'
-
-
-
- - 1 - Formatted: August 24, 1994
-
-
- AGREP(l) AGREP(l)
- June 11, 1991
-
-
-
- Define delim to be the separator between two records. The
- default value is '$', namely a record is by default a line.
- delim can be a string of size at most 8 (with possible use of ^
- and $), but not a regular expression. Text between two delim's
- is considered as one record. For example, -d '$$' defines
- paragraphs as records and -d '^From ' defines mail messages as
- records. agrep matches each record separately. This option does
- not currently work with regular expressions. delim cannot
- currently contain special control characters.
-
- -e pattern
- Same as a simple pattern argument, but useful when the pattern
- begins with a `-'.
-
- -h Do not display filenames.
-
- -i Case-insensitive search - e.g., "A" and "a" are considered
- equivalent.
-
- -l List only the files that contain a match.
-
- -n Each line that is printed is prefixed by its line number in the
- file.
-
- -p Find lines in the text that contain a supersequence of the
- pattern. For example,
- agrep -p DCS foo will match "Department of Computer Science."
- This option has the same function as -I0, which sets the cost of
- insertion to zero.
-
- -s Work silently, that is, display nothing except error messages.
- This is useful for checking the error status.
-
- -v Inverse mode - display only those lines that do not contain the
- pattern.
-
- -w Search for the pattern as a word - i.e., surrounded by non-
- alphanumeric characters. The non-alphanumeric must surround the
- match; they cannot be counted as errors. For example, agrep -w
- -1 car will match cars, but not characters.
-
- -x The pattern must match the whole line.
-
- -Ik Set the cost of an insertion to k (k is a non-negative integer).
- This option does not currently work with regular expressions.
-
- -Dk Set the cost of a deletion to k (k is a non-negative integer).
- This option does not currently work with regular expressions.
-
- -Sk Set the cost of a substitution to k (k is a non-negative
- integer). This option does not currently work with regular
-
-
-
- - 2 - Formatted: August 24, 1994
-
-
- AGREP(l) AGREP(l)
- June 11, 1991
-
-
-
- expressions.
-
- PATTERNS
- agrep supports a large variety of patterns, including simple strings,
- strings with classes of characters, sets of strings, wild cards, and
- arbitrary regular expressions.
-
- Strings
- any sequence of characters, including the special symbols `^' for
- beginning of line and `$' for end of line. The special
- characters listed above ( `$', `^', `*', `[', `^', `|', `(', `)',
- `!', and `\' ) should be preceded by `\' if they are to be
- matched as regular characters. For example, \^abc\\ corresponds
- to the string ^abc\, whereas ^abc corresponds to the string abc
- at the beginning of a line.
-
- Classes of characters
- a list of characters inside [] (in order) corresponds to any
- character from the list. For example, [a-ho-z] is any character
- between a and h or between o and z. The symbol `^' inside []
- complements the list. For example, [^i-n] is the same as [a-ho-
- z]. The symbol `.' stands for any symbol (don't care). The
- symbol `^' thus has two meanings, but this is consistent with
- egrep.
-
- Boolean operations
- agrep supports an `and' operation `;' and an `or' operation `,',
- but not a combination of both. For example, 'fast;network'
- searches for all records containing both words.
-
- Wild cards
- The symbol '#' is used to denote a wild card. # matches zero or
- any number of arbitrary characters. For example, ex#e matches
- example. The symbol # is equivalent to .* in egrep. In fact, .*
- will work too, because it is a valid regular expression (see
- below), but unless this is part of an actual regular expression,
- # will work faster.
-
- Combination of exact and approximate matching
- any pattern inside angle brackets <> must match the text exactly
- even if the match is with errors. For example, <mathemat>ics
- matches mathematical with one error (replacing the last s with an
- a), but mathe<matics> does not match mathematical no matter how
- many errors we allow.
-
- Regular expressions
- The syntax of regular expressions in agrep is in general the same
- as that for egrep. The union operation `|', Kleene closure `*',
- and parentheses () are all supported. Currently '+' is not
- supported. Regular expressions are currently limited to
- approximately 30 characters (generally excluding meta
-
-
-
- - 3 - Formatted: August 24, 1994
-
-
- AGREP(l) AGREP(l)
- June 11, 1991
-
-
-
- characters). Some options (-d, -w, -x, -D, -I, -S) do not
- currently work with regular expressions. The maximal number of
- errors for regular expressions that use '*' or '|' is 4.
-
- EXAMPLES
- agrep -2 -c ABCDEFG foo
- gives the number of lines in file foo that contain ABCDEFG within
- two errors.
-
- agrep -1 -D2 -S2 'ABCD#YZ' foo
- outputs the lines containing ABCD followed, within arbitrary
- distance, by YZ, with up to one additional insertion (-D2 and -S2
- make deletions and substitutions too "expensive").
-
- agrep -5 -p abcdefghij /usr/dict/words
- outputs the list of all words containing at least 5 of the first
- 10 letters of the alphabet in order. (Try it: any list starting
- with academia and ending with sacrilegious must mean something!)
-
- agrep -1 'abc[0-9](de|fg)*[x-z]' foo
- outputs the lines containing, within up to one error, the string
- that starts with abc followed by one digit, followed by zero or
- more repetitions of either de or fg, followed by either x, y, or
- z.
-
- agrep -d '^From ' 'breakdown; (inter|arpa|bit)net' mbox
- outputs all mail messages (the pattern '^From ' separates mail
- messages in a mail file) that contain breakdown and one of either
- internet, arpanet, or bitnet.
-
- agrep -d '$$' -1 '<word1> <word2>' foo
- finds all paragraphs that contain word1 followed by word2 with
- one error in place of the blank. In particular, if word1 is the
- last word in a line and word2 is the first word in the next line,
- then the space will be substituted by a newline symbol and it
- will match. Thus, this is a way to overcome separation by a
- newline. Note that -d '$$' (or another delim which spans more
- than one line) is necessary, because otherwise agrep searches
- only one line at a time.
-
- agrep '^agrep' <this manual>
- outputs all the examples of the use of agrep in this man pages.
-
- SEE ALSO
- ed(1), ex(1), grep(1V), sh(1), csh(1).
-
- BUGS
- This is the first release of agrep. Expect some bugs, especially for
- more complicated patterns. Any bug reports or comments will be
- appreciated! Please mail them to sw@cs.arizona.edu or
- udi@cs.arizona.edu There may be problems when control characters
-
-
-
- - 4 - Formatted: August 24, 1994
-
-
- AGREP(l) AGREP(l)
- June 11, 1991
-
-
-
- (e.g., <ctrl>A ) are used as part of a string or delimiter. Regular
- expressions do not support the '+' operator (match 1 or more instances
- of the preceding token). These can be searched for by using this
- syntax in the pattern:
-
- 'pattern(pattern)*'
-
- (search for strings containing one instance of the pattern, followed
- by 0 or more instances of the pattern). sometimes adds an empty line
- to the output. The following can cause an infinite loop: agrep
- pattern * > output_file. If the number of matches is high, they may
- be deposited in output_file before it is completely read leading to
- more matches of the pattern within output_file (the matches are
- against the whole directory). It's not clear whether this is a "bug"
- (grep will do the same), but be warned. patterns are currently
- limited to approximately 30 characters. Lines are limited to 1024
- characters. Records are limited to 8K, and may be truncated if they
- are larger than that.
-
- DIAGNOSTICS
- Exit status is 0 if any matches are found, 1 if none, 2 for syntax
- errors or inaccessible files.
-
-
-
- - 5 - Formatted: August 24, 1994
-