ProfitPress Mega CDROM2 Shareware Freeware (MSDOS)(1992)(Eng)

home *** CD-ROM | disk | FTP | other *** search

/ ProfitPress Mega CDROM2 …eeware (MSDOS)(1992)(Eng) / ProfitPress-MegaCDROM2.B6I / TEXT / UTILITY / SSPELL11.ZIP / README < prev next >

Wrap

Text File | 1992-01-26 | 7.7 KB | 189 lines

sspell - similar to Unix spell version 1.1 Author: Maurice Castro Release Date: 26 Jan 1992 Bug Reports: maurice@bruce.cs.monash.edu.au This code has been placed by the Author into the Public Domain. The code is NOT covered by any warranty, the user of the code is solely responsible for determining the fitness of the program for their purpose. No liability is accepted by the author for the direct or indirect losses incurred through the use of this program. Segments of this code may be used for any purpose that the user deems appropriate. It would be polite to acknowledge the source of the code. If you modify the code and redistribute it please include a message indicating your changes and how users may contact you for support. The author reserves the right to issue the official version of this program. If you have useful suggestions or changes for the code, please forward them to the author so that they might be incorporated into the official version Please forward bug reports to the author via Internet. * Introduction The program SSPELL was written by the author to provide a Unix like spell checker on a PC. There are several utilities of this type already available, however, most lacked at least one of the following: 1. Public Domain 2. Source Code 3. Simple, editable word list structure 4. Configurable prefix and suffix list. 5. To use minimal memory 6. To have an unlimited word list length 7. Reasonable speed 8. Portable The SSPELL program provides all these features. The program currently compiles under Turbo C++ (Borland) for MS-DOS and cc for Unix (OSx for Pyramid, SunOS for Sun 3/50, Ultrix for Decstation 2100). Minor modification will be required to compile under other Unix variants. * Features The SSPELL program uses a sorted plain ASCII word list for its dictionary. This makes adding new words to the list easy. Simply add the words and re-sort the list. To gain speed, without loading the complete list into memory, a cache of words recently recovered from the word list is maintained, the disk is only searched if the word is not found in the cache. A suffix/prefix list is used to allow a smaller dictionary to be used. * Operation Edit the config.h file to set up the required default locations and compile the code. Place the dictionary in the file specified in the config.h and make sure that the index file is writable. SSPELL should now be ready for use. Performance gains may be had by altering the parameters found in the config.h file. Increasing CACHESIZE increases the memory usage of the program, but decreases disk search time. IDXSIZ and HASHWID control the size of the index to the disk file. HASHWID determines the maximum number of characters compared to determine if an item occurs in a given slot. IDXSIZ determines the number of slots. A typical IBM-PC implementation could be written as: #define DICTIONARY "c:\\utility\\dict\\main.dct" #define INDEX "c:\\utility\\dict\\main.idx" #define RULE "c:\\utility\\dict\\rule.lst" #define CACHESIZE 1000 #define ROOTNAME "c:\\tmp\\sspell" #define SORT "c:\\dos\\sort" #define MAXSTR 128 #define SEPSTR " \n\r\t!@#$%^&*(),.<>~`\":;|/\\{}[]" /* HASHWID must always be 2 or greater */ #define HASHWID 8 #define IDXSIZ 1000 * Command Line SSPELL has the following command line options: sspell [-v] [-x] [-D dict] [-I index] [-R rule] [-C cachesize] [file] ... -v all words not actually in the word list are printed and plausible derivations from the word list are indicated -x all plausible stems are output -D `dict' is the pathname of an alternate dictionary -I `index' is the pathname of an alternate index. This should be used if using a personalised dictionary or if the index file is unwriteable. -R `rule' is the pathname of an alternate rule list -C `cachesize' is the size of the cache of words found in the dictionary. SSPELL will take input from a list of files on the command line or from stdin if no files are supplied. The dictionary must be in sorted order with the capital letters folded onto the small letters. (Using Unix sort: sort -fu). The case of words in the dictionary is significant. Any letter appearing as a capital in the dictionary must appear as a capital in the text to be regarded as spelled correctly. The format of the rule list is fixed. `#' in the first column indicates a comment. All other lines are of the form: pre|post <prefix/suffix> <required> <forbiden> <delete> Any field not used must be filled with a `-'. The following examples illustrate the features of the rules. pre un - - - post ive - e - post ive e - e post ied y ay,ey,iy,oy,uy y The prefix rules are simple, their are no required or forbidden sequences and nothing to delete. Prefixes must not be more complex. The suffix rules are more complex. These rule specify the ending to be added to the root after the deletion of the delete field, provided that the word has a required ending, provided that the combination is not forbidden. For example: carried. root: carry required `y': carry the last letter is a `y' forbidden: the word does not end in a forbidden sequence delete `y': carr suffix `ied': carried * Overview of Internal Operation SSPELL creates an index file which speeds access to the main dictionary, the index is a simple list of the first part of words evenly spaced through the dictionary, the number of significant letters and the number of slots are set using hash defines in the config.h file. The index file is only created if: No index file exists or the dictionary has been modified since the index was created. The Dictionary is checked for correct ordering during the creation of the index file. Words are checked for correct spelling by initially checking the cache. The cache is a move to front list, so more recently used words are at the front of the cache. The cache size is bounded by a limit set in the config.h file. If the word is not found in the cache then an exact match is checked for in the file. If no exact match is found then a derivation is checked for in the cache and subsequently in the file. If a word in the dictionary matches either a derivation or the original then the dictionary word is inserted at the head of the cache list. Hyphenation and number identification have been left out of the above description. The output of the search process is put in a file, the file is then sorted using the local operating system sorting utility. The result is then listed on standard out such that duplicated lines appear only once. * Acknowledgments My thanks to people who have contributed to this program: Michael Oldfield (mao@physics.su.OZ.AU) for a number of bug fixes * Conclusion I hope that this program proves useful. Comments and suggestions welcomed; I can be contavcted via E-Mail at maurice@bruce.cs.monash.edu.au Maurice Castro