home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: alt.sources
- From: goer@ellis.uchicago.edu (Richard L. Goerwitz)
- Subject: kjv browser, part 11 of 11
- Message-ID: <1991Jul3.065346.28525@midway.uchicago.edu>
- Date: Wed, 3 Jul 1991 06:53:46 GMT
-
- ---- Cut Here and feed the following to sh ----
- #!/bin/sh
- # this is bibleref.11 (part 11 of a multipart archive)
- # do not concatenate these parts, unpack them in order with /bin/sh
- # file README.rtv continued
- #
- if test ! -r _shar_seq_.tmp; then
- echo 'Please unpack part 1 first!'
- exit 1
- fi
- (read Scheck
- if test "$Scheck" != 11; then
- echo Please unpack part "$Scheck" next!
- exit 1
- else
- exit 0
- fi
- ) < _shar_seq_.tmp || exit 1
- if test ! -f _shar_wnt_.tmp; then
- echo 'x - still skipping README.rtv'
- else
- echo 'x - continuing file README.rtv'
- sed 's/^X//' << 'SHAR_EOF' >> 'README.rtv' &&
- Xvalues for. In our hypothesized scenario, you would want makeind to
- Xstore the max value for the verse field for every chapter of every
- Xbook in the Bible. The verse field (field #3), in other words, is
- Xyour "rollover" field, and would be passed to makeind using the -l
- Xoption. Assuming "kjv" to be the name of your indexable biblical
- Xtext, this set of circumstances would imply the following invocation
- Xfor makeind:
- X
- X makeind -f kjv -m 176 -n 3 -l 3
- X
- XIf you were to want a case-sensitive index (not a good idea), you
- Xwould add "-s" to the argument list above (the only disadvantage a
- Xcase-insensitive index would bring is that it would obscure the
- XLord/lord, and other similar, distinctions).
- X Actual English Bible texts usually take up 4-5 megabytes.
- XIndexing one would require at over twice that much core memory, and
- Xwould take at least an hour on a fast machine. The end result would
- Xbe a set of data files occupying about 2 megabytes plus the 4-5
- Xmegabytes of the original file. The Bible is hardly a small book.
- XOnce these data files were created, they could be moved, along with
- Xthe original source file, to any platform you desired.
- X Having indexed, and having moved the files to wherever you
- Xwanted them, you would then be ready for step 3.
- X
- X
- X--------
- X
- X
- XStep 3: Writing a Program to Access Indexed Files
- X
- X When accessing text files such as the Bible, the most useful
- Xunit for searches is normally the word. Let us suppose you are a
- Xzealous lay-speaker preparing a talk on fire imagery and divine wrath
- Xin the Bible. You would probably want to look for every passage in
- Xthe text that contained words like
- X
- X fire, firy
- X burn
- X furnace
- X etc.
- X
- XTo refine the search, let us say that you want every instance of one
- Xof these fire words that occurs within one verse of a biblical title
- Xfor God:
- X
- X God
- X LORD
- X etc.
- X
- XThe searches for fire, firy, burn, etc. would be accomplished by
- Xcalling a routine called retrieve(). Retrieve takes three arguments:
- X
- X retrieve(pattern, filename, invert_search)
- X
- XThe first argument should be a string containing a regular expression
- Xbased pattern, such as
- X
- X fir(y|e|iness)|flam(e|ing)|burn.*?
- X
- XNote that the pattern must match words IN THEIR ENTIRETY. So, for
- Xinstance, "fir[ie]" would not catch "firiness," but rather only
- X"fire." Likewise, if you want every string beginning with the
- Xsequence "burn," the string "burn" will not work. Use "burn.*"
- Xinstead. The filename argument supplies retrieve() with the name of
- Xthe original text file. The last argument, if nonnull, inverts the
- Xsense of the search (a la egrep -v). In the case of the fire words
- Xmentioned above, one would invoke retrieve() as follows:
- X
- X hits1 := retrieve("fir(y|e|iness)|flam(e|ing)|burn.*?", "kjv")
- X
- XFor the divine names, one would do something along these lines:
- X
- X hits2 := retrieve("god|lord", "kjv")
- X
- X Having finished the basic word searches, one would then
- Xperform a set intersection on them. If we are looking for fire words
- Xwhich occur at most one verse away from a divine name, then we would
- Xspecify 1 as our range (as opposed to, say, zero), and the verse as
- Xour unit. The utility you would use to carry out the search is
- Xr_and(). R_and() would be invoked as follows:
- X
- X hits3 := r_and(hits1, hits2, "kjv", 3, 1)
- X
- XThe last two arguments, 3 and 1, specify field three (the "verse"
- Xfield) and field 1 (the range).
- X To display the text for your "hit list" (hits3 above), you
- Xwould call bitmap_2_text():
- X
- X every write(!bitmap_2_text(hits3, "kjv"))
- X
- XBitmap_2_text converts the location designators contained in hits3
- Xinto actual text.
- X The three basic functions mentioned above - retrieve(),
- Xr_and(), and bitmap_2_text() - are contained in the three distinct
- Xfiles (retrieve.icn, retrops.icn, and bmp2text.icn, respectively).
- XOther useful routines are included in these files, and also in
- Xwhatnext.icn. If you are planning on writing a retrieval engine for
- Xserious work of some kind, you would probably want to construct a mini
- Xinterpreter, which would convert strings typed in by the user at
- Xrun-time into internal search and retrieval operations.
- X Note that I have included no routine to parse or expand
- Xhuman-readable input (the nature of which will naturally vary from
- Xtext to text). Again, using the Bible as our hypothetical case, it
- Xwould be very useful to be able to ask for every passage in, say,
- XGenesis chapters 2 through 4, and to be able to print these to the
- Xscreen. Doing this would require a parsing routine to break down the
- Xreferences, and map them to retrieve-internal format. The routine
- Xwould then have to generate all valid locations from the minimum value
- Xin chapter 2 above to the max in chapter 4. See the file whatnext.icn
- Xfor some aids in accomplishing this sort of task.
- X
- X
- X--------
- X
- X
- XStep 4: Compiling and Running Your Program
- X
- X Assuming you have written a search/retrieval program using the
- Xroutines contained in retrieve.icn, retrops.icn, bmp2text.icn, and
- Xwhatnext.icn, you would now be ready to compile it. In order to
- Xfunction properly, these routines would need to be linked with
- Xinitfile.icn and indexutl.icn. Specific dependencies are noted in the
- Xindividual files in case there is any confusion.
- X If you have made significant use of this package, you probably
- Xshould not worry about the exact dependencies, though. Just link
- Xeverything in together, and worry about what isn't needed after you
- Xhave fully tested your program:
- X
- X icont -o yourprog yourprog.icn initfile.icn indexutl.icn \
- X retrieve.icn retrops.icn bmp2text.icn binsrch.icn
- X
- X
- X--------
- X
- X
- XProblems, bugs:
- X
- X This is really an early beta release of the retrieve package.
- XI use it for various things. For instance, I recently retrieved a
- Xtext file containing informal reviews of a number of Science Fiction
- Xworks. My father likes SciFi, and it was close to Fathers' Day, so I
- Xindexed the file, and performed cross-referenced searches for words
- Xlike "very good," "brilliant," and "excellent," omitting authors my
- Xfather has certainly read (e.g. Herbert, Azimov, etc.). I also had
- Xoccasion to write a retrieval engine for the King James Bible (hence
- Xthe many examples from this text), and to construct a retrieval
- Xpackage for the Hebrew Bible, which I am now using to gather data for
- Xvarious chapters of my dissertation. I'm happy, incidentally, to hand
- Xout copies of my KJV retrieval program. It's a clean little program
- Xthat doubtless many would find useful. The Hebrew Bible retrieval
- Xpackage I'll hand out as well, but only to fully competent Icon
- Xprogrammers who feel comfortable with Hebrew and Aramaic. This latter
- Xretrieval package a much less finished product, and would almost
- Xcertainly need to be hacked to work on platforms other than what I
- Xhave here at home (a Xenix/386 setup with a VGA).
- X In general, I hope that someone out there will find these
- Xroutines useful, if for no other reason than that it will mean that I
- Xget some offsite testing. Obviously, the whole package could have
- Xbeen written/maintained in C or something that might offer much better
- Xperformance. Doing so would, however, have entailed a considerable
- Xloss of flexibility, and would have required a lot more time on my
- Xpart. Right now, the retrieve package occupies about 60k of basic
- Xsource files, probably half of which consists of comments. When
- Xcompiled together with a moderate-size user interface, the total
- Xpackage typically comes to about 150k. In-core size typically runs
- Xabout 350k on my home machine here (a Xenix/386 box), with the basic
- Xrun-time interpreter taking up a good chunk of that space all on its
- Xown. It's not a small package, but I've found it a nice base for
- Xrapid prototyping and development of small to medium-size search and
- Xretrieval engines.
- X
- X -Richard L. Goerwitz goer%sophist@uchicago.bitnet
- X goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
- SHAR_EOF
- echo 'File README.rtv is complete' &&
- true || echo 'restore of README.rtv failed'
- rm -f _shar_wnt_.tmp
- fi
- rm -f _shar_seq_.tmp
- echo You have unpacked the last part
- exit 0
- --
-
- -Richard L. Goerwitz goer%sophist@uchicago.bitnet
- goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
-