home *** CD-ROM | disk | FTP | other *** search
- /* Copyright (c) 1994 Sun Wu, Udi Manber, Burra Gopal. All Rights Reserved. */
- This is version 2.04 of agrep - a new tool for fast
- text searching allowing errors.
- agrep is similar to egrep (or grep or fgrep), but it is much more general
- (and usually faster).
- The main changes from version 1.1 are 1) incorporating Boyer-Moore
- type filtering to speed up search considerably, 2) allowing multi patterns
- via the -f option; this is similar to fgrep, but from our experience
- agrep is much faster, 3) searching for "best match" without having to
- specify the number of errors allowed, and 4) ascii is no longer required.
- Several more options were added.
-
- To compile, simply run make in the agrep directory after untar'ing
- the tar file (tar -xf agrep-2.04.tar will do it).
-
- The three most significant features of agrep that are not supported by
- the grep family are
- 1) the ability to search for approximate patterns;
- for example, "agrep -2 homogenos foo" will find homogeneous as well
- as any other word that can be obtained from homogenos with at most
- 2 substitutions, insertions, or deletions.
- "agrep -B homogenos foo" will generate a message of the form
- best match has 2 errors, there are 5 matches, output them? (y/n)
- 2) agrep is record oriented rather than just line oriented; a record
- is by default a line, but it can be user defined;
- for example, "agrep -d '^From ' 'pizza' mbox"
- outputs all mail messages that contain the keyword "pizza".
- Another example: "agrep -d '$$' pattern foo" will output all
- paragraphs (separated by an empty line) that contain pattern.
- 3) multiple patterns with AND (or OR) logic queries.
- For example, "agrep -d '^From ' 'burger,pizza' mbox"
- outputs all mail messages containing at least one of the
- two keywords (, stands for OR).
- "agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages
- containing both keywords.
-
- Putting these options together one can ask queries like
-
- agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib
-
- which outputs all paragraphs referencing articles in CACM between
- 1985 and 1989 by TheAuthor dealing with curriculum.
- Two errors are allowed, but they cannot be in either CACM or the year
- (the <> brackets forbid errors in the pattern between them).
-
- Other features include searching for regular expressions (with or
- without errors), unlimited wild cards, limiting the errors to only
- insertions or only substitutions or any combination,
- allowing each deletion, for example, to be counted as, say,
- 2 substitutions or 3 insertions, restricting parts of the query
- to be exact and parts to be approximate, and many more.
-
- agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5)
- as agrep/agrep-2.04.tar.Z (or in uncompressed form as agrep/agrep-2.04.tar).
- The tar file contains the source code (in C), man pages (agrep.1),
- and two additional files, agrep.algorithms and agrep.chronicle,
- giving more information.
- The agrep directory also includes two postscript files:
- agrep.ps.1 is a technical report from June 1991
- describing the design and implementation of agrep;
- agrep.ps.2 is a copy of the paper as appeared in the 1992
- Winter USENIX conference.
-
- Please mail bug reports (or any other comments)
- to sw@cs.arizona.edu or to udi@cs.arizona.edu.
-
- We would appreciate if users notify us (at the address above)
- of any extensions, improvements, or interesting uses of this software.
-
- January 17, 1992
-
-
- BUGS_fixed/option_update
-
- 1. remove multiple definitions of some global variables.
- 2. fix a bug in -G option.
- 3. fix a bug in -w option.
- January 23, 1992
-
- 4. fix a bug in pipeline input.
- 5. make the definition of word-delimiter consistant.
- March 16, 1992
-
- 6. add option '-y' which, if specified with -B option, will always
- output the best-matches without a prompt.
- April 10, 1992
-
- 7. fix a bug regarding exit status.
- April 15, 1992
-
- -------------------------------------------------------------------------------
- REVISIONS TO AGREP, FALL '93
-
- 8. Options can now be specified in a single group of characters after one '-'.
- - Sept 3rd 1993
-
- 9. Made agrep callable as a library routine from a separate function. The
- interface is: memagrep(argc, argv, searchbufferlen, searchbuffer),
- the pattern to be searched for and the options being specified EXACTLY as
- if they are being specified on the command line. The only difference is
- that instead of the file-names to look at, the user should specify a buffer
- and its length. Sample user programs are in ../user directory.
-
- In memagrep(), there are TWO peculiarities:
- 1. Peculiarity #1 -- at the end of the buffer, the user must have N bytes
- of valid virtual memory, where N is the length of the pattern to be
- searched. This space is used by agrep to speed up the checking of the
- termination condition. Its contents are restored before memagrep()
- returns -- however, some space must be there... else you'll get SIGSEGV.
- I might trap segv and do a longjmp, but that'll be in a new version!
- 2. The search buffer must begin with a newline so that it is easy for
- agrep to output matched lines. This also avoids some copying.
- Ofcourse, if we copied the user's search buffer into another buffer which
- meets both the above conditions, memagrep() will no longer be fast -- and
- speed is the primary goal.
-
- - Sept 27th 1993
-
- 10. Added some filter-programs to make agrep search thru compressed files.
- Also added some features in the Makefile which allows the user to build
- an agrep with a dummyfilter so that agrep remains independent of tcompress.
- The definitions needed in agrep to interface with tcompress are in defs.h
-
- - Nov 10th 1993
-
- 11. Added a library interface for searching thru a specified set of files,
- fileagrep(), which is similar to memagrep(). This is used by glimpse.
- Had to modify some other things and fix some bugs (see CHANGES).
-
- - Dec 1993 (coding), Jan 1993 (debugging).
- -------------------------------------------------------------------------------
-
- CODING NOTE: sgrep.c and newmgrep.c use a similar while(fill_buf) loop
- with start and end, while others use loops with an internal variable i.
-