Source Code 1992 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1992 March / Source_Code_CD-ROM_Walnut_Creek_March_1992.iso / usenet / altsrcs / 1 / 1203 < prev next >

Wrap

Internet Message Format | 1990-12-28 | 1.8 KB

From: jnelson@gauche.enet.dec.com (Jeff E. Nelson) Newsgroups: comp.sources.wanted,comp.lang.c,alt.sources Subject: Re: key word searches in text files Message-ID: <10384@shlump.nac.dec.com> Date: 18 Apr 90 21:01:51 GMT [LEBF] In article <6109@ozdaltx.UUCP>, root@ozdaltx.UUCP (root) writes: > We maintain a large mass of text files on the board and I would like > for a caller to able to look for key works in those files WITHOUT having > to use [e]grep to bang away at the files... This is the method we're > currently using. It works, but is slow not to mention the wear and > tear on the HD. How about simply creating an index file for each text file that you want indexed? Because you mention "egrep" I assume you have Unix. To create an index file, do this: tr -cs A-Za-z '\012' | sort -u The 'tr' command breaks up the text file into a list of words, one per line. The 'sort' command sorts the list with the '-u' flag removing duplicates. Pipe the text file as input, pipe the output to an index file. When someone wants to find out if a particular text file contains a keyword, search the index file corresponding to the text file. This solution is simple and effecient: (1) the index file only needs to be built once per text file, and (2) the index is optimized in that it contains only unique words, which saves wear and tear on your HD. However, it may not be exactly what you need. One drawback, for example, is that all you know is the existance of a keyword in a file; you don't know *where* in the file the keyword appeared. If this isn't what you're looking for, then you'd better be more specific with what it is you want to accomplish. -Jeff E. Nelson -Digital Equipment Corporation -Internet: jnelson@tle.enet.dec.com -Affiliation given for identification purposes only.