home *** CD-ROM | disk | FTP | other *** search
- From: jnelson@gauche.enet.dec.com (Jeff E. Nelson)
- Newsgroups: comp.sources.wanted,comp.lang.c,alt.sources
- Subject: Re: key word searches in text files
- Message-ID: <10384@shlump.nac.dec.com>
- Date: 18 Apr 90 21:01:51 GMT
-
- [LEBF]
- In article <6109@ozdaltx.UUCP>, root@ozdaltx.UUCP (root) writes:
- > We maintain a large mass of text files on the board and I would like
- > for a caller to able to look for key works in those files WITHOUT having
- > to use [e]grep to bang away at the files... This is the method we're
- > currently using. It works, but is slow not to mention the wear and
- > tear on the HD.
-
- How about simply creating an index file for each text file that you want
- indexed? Because you mention "egrep" I assume you have Unix. To create
- an index file, do this:
-
- tr -cs A-Za-z '\012' | sort -u
-
- The 'tr' command breaks up the text file into a list of words, one per
- line. The 'sort' command sorts the list with the '-u' flag removing
- duplicates. Pipe the text file as input, pipe the output to an index
- file. When someone wants to find out if a particular text file contains
- a keyword, search the index file corresponding to the text file.
-
- This solution is simple and effecient: (1) the index file only needs to
- be built once per text file, and (2) the index is optimized in that it
- contains only unique words, which saves wear and tear on your HD.
- However, it may not be exactly what you need. One drawback, for
- example, is that all you know is the existance of a keyword in a file;
- you don't know *where* in the file the keyword appeared.
-
- If this isn't what you're looking for, then you'd better be more
- specific with what it is you want to accomplish.
-
- -Jeff E. Nelson
- -Digital Equipment Corporation
- -Internet: jnelson@tle.enet.dec.com
- -Affiliation given for identification purposes only.
-