home *** CD-ROM | disk | FTP | other *** search
-
- INFOBASE - Text file index & search by Keyword.
-
- Shareware, by Steve Rackley, Copyright 1991.
-
- Version 1.0, March 1991.
-
- 1. Introduction.
-
- InfoBase is a system for searching text files which consist of a
- number of sections, where each section is identified by a Keyword
- header line. Only the keyword lines are scanned for the words being
- searched for - the addresses of these lines are held in an index file,
- which is used to give fast access to the keywords. This approach gives
- a very small index file (4 bytes per text section), whilst still giving
- a fairly fast search. This is a rather different approach from most
- other programs of this type: generally they index every word in the file
- except those excluded by the user. This gives an index file of about the
- same size as the text file, and doesn't solve the most important problem:
- the text section may not actually contain the word you want (Eg an item
- about Telix may not include the word 'Comms'), and text may also include
- words that don't have any significance (eg the word 'Comms' may be
- mentioned in an item that is nothing to do with comms). Using the
- 'index every word' method and searching for all references to comms
- programs may therefore omit some, while including some false references.
- The only other method that I found was in a program that indexed the
- text on keywords, but only allowed 1 set of keywords per file, forcing
- the user to keep all the text items in separate files. InfoBase is an
- attempt to get the best of both worlds.
-
- The system consists of 4 files: this doc file, a program to build the
- index file (IBIX.EXE), the search program (IBASE.EXE) and a small sample
- Information file (IBASE.TXT).
-
- InfoBase is Shareware - feel free to distribute it to others, provided
- that no changes are made to the programs or documentation, but please
- remember that it is NOT free software. If you find it useful, please
- register your copy, by sending a small sum (ten pounds Sterling is
- suggested) to:
-
- Steve Rackley,
- 17 Tichborne Close
- Blackwater,
- Camberley,
- Surrey GU17 0JQ
- UK.
-
- Registered users will receive the latest version free of charge, and
- will be entitled to at least one free upgrade if later enhancements
- are made.
-
-
- 2. Setting up your text files.
-
- I mainly use InfoBase to index and search a large file containing
- messages filed by the Telepathy offline reader for CIX. After filing
- the messages that I want to hold for later reference, I edit in the
- keyword headers - one header for each message - and re-index the file.
- Another use for the system is to index on-disc technical reference works,
- such as those listing DOS and BIOS functions. In this case a keyword
- header should be added at the start of each relevant section, giving
- words to identify the subjects mentioned in the section, eg disc, video
- etc. The best point is that you may have several files like this from
- various different sources - by merging them all into a single file
- you can greatly simplify the process of searching for information on
- various functions.
-
- All you need to do to make the file suitable for InfoBase is the
- following:
-
- a. Identify the 'sections' that you want to split the file into, eg
- chapters and sections of an on-disc document, or individual messages
- filed from a conferencing system or bulletin board.
-
- b. Using a text editor, at the start of each section insert a line
- beginning '@@@KEY@@@ ', and containing any words relevant to the
- section. For example, a filed message about writing Telix scripts
- might be given the keywords 'Telix', 'Comms', and 'Script'. In that
- case the full header line would read:
-
- @@@KEY@@@ telix comms script
-
- IBASE.EXE can be instructed to find multiple words using 'And' or 'Or'
- combinations, so the above line would be found by searching for 'Telix',
- 'Telix or Comms', 'Telix and Script', and so on.
-
- HINT: Don't use plurals. 'Game' and 'Games', for example, are two
- different words as far as InfoBase is concerned, so if you put 'game'
- in the header then search for 'games', or vice-versa, you won't find
- much. If you make it a rule to never use plurals in headers or on
- the search line, you won't have this problem.
-
- c. Run IBIX.EXE to build the index file, or just run IBASE.EXE - this will
- ask you if you want to re-index, if it does not find a valid index
- file.
-
- 3. Using InfoBase.
-
- The command-line for IBASE.EXE is: ibase <text-file-name>.
- The program will look for an index file with the same name as the text
- file but an extension of .IBX.
-
- If an index file is not found, or if it is out of date (new items added
- to the end of the text file, or the last index entry doesn't point to a
- @@@KEY@@@ line because the text file has been edited), you will be asked
- if you want to re-index the file. Replying Y will call IBIX.EXE to carry
- out the re-index, and then continue. Replying N will abandon the program.
-
- IBIX can also be run separately to force a re-index. The command line
- is much the same: ibix <text-file-name>.
-
- You will then be prompted for a search string. To find a single word,
- just enter that word. To find text sections with all of several words
- in their headers, enter 'word1 word2 word3' etc. You may explicitly
- put '&' characters between the words (word1 & word2, etc), but this isn't
- essential - AND is the default. To find sections with one or more of
- several words in their headers, you must use the '|' character between
- the words, ie 'word1 | word2 | word3'.
-
- Each time a matching header is found, its following section will be
- displayed. If it is more than a screenful, a 'More...' prompt will be
- given after each screenful - press any key to carry on. At the end of
- each section, you will be prompted to press ESC to stop the search, or
- any other key to look for another matching header.
-
- Output may be redirected to a file by ending the Search String line with
- '>filename' (to open a new file, overwriting any existing one of the
- same name), or '>>filename' (to append the text to an existing file, or
- create the file if it doesn't exist).
-
- Examples:
-
- telix & script Display any sections whose keywords include
- the words 'telix' and 'script'.
-
- telix script Exactly the same as above - the '&' is
- optional.
-
- telix | comms >commfile Find any sections whose keywords include the
- words 'telix' OR 'comms', and copy them to a
- file called 'commfile'. This file will always
- be re-created, destroying any existing copy.
-
- procomm >>commfile Find any sections whose keywords include the
- word 'procomm', and copy them to a file called
- 'commfile'. They will be added to the end of
- the file if it already exists, otherwise the
- file will be created.
-
-
- 4. Features and restrictions.
-
- All searches are case-independent, ie 'FRED', 'Fred' and 'fred' are all
- considered to be identical.
-
- Up to 10 words can be specified on the search line. (Excluding &/|
- characters and the optional output file).
-
- The maximum length of a word is 15 characters.
-
- AND and OR conditions cannot be combined in the search line at present.
-
- There is no limit to the number of words on a @@@KEY@@@ line, but the
- total length of the line is currently limited to 127 characters.
-
- There is no limit to the number of @@@KEY@@@ lines in a file, or to the
- file size.
-
- @@@KEY@@@ lines are copied with the text when an output file is specified,
- so the resulting file is suitable for indexing as a new InfoBase file.
-
- 5. Performance.
- Some people may be discouraged by the fact that the text file must be
- re-indexed whenever it is changed, or by the fact that only the keyword
- header addresses are indexed. Don't be, unless your file is huge or
- your machine is very slow! InfoBase was tested on a 25Mhz 486 with a
- 16ms ESDI drive, and on an 8Mhz 286 with a 28Ms MFM drive.
- The file used was almost 457K, and contained 265 sections, with an
- average of four keywords per section.
-
- Results:
- Reindex file: 486: 3 seconds 286: 24 seconds
-
- Search entire file for non-existent keyword:
- 486: 2 seconds 286: 6 seconds
-
-
-
- 6. Future plans.
-
- If a few people register InfoBase I may improve it. The obvious thing to
- do first is to add the ability to append new files to the main file, and
- to insert and edit the keyword lines, thus doing away with the need for
- a separate text editor. In fact I'll probably do this if feedback
- suggests that a few people are using InfoBase, whether or not I get any
- registrations.
-
- I may also change the indexing method if some people find the access too
- slow, though I don't think this will be necessary unless some really huge
- files are used: 6 seconds to go through 265 sections on an average-speed
- machine isn't too bad. The alternative is to store the actual keywords in
- the index file, which will give a much bigger file and is unlikely to
- improve the search times all that much.
-
- Something else I'm considering is the addition of multiple search
- conditions, with bracketed expressions, eg:
- (fred & bill) | joe.
-
- Further suggestions are welcome, and please report any bugs or problems,
- including documentation errors/omissions.
-
- I can be contacted by snail-mail at the address given in the introduction,
- or as srackley@cix.compulink.co.uk
-
-
-