home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-07-03 | 40.9 KB | 1,072 lines |
- #!/bin/sh-----cut here-----cut here-----cut here-----cut here-----
- # shar: Shell Archiver
- # Run the following text with /bin/sh to create:
- # README # copyr.h # dawg.h # dictlib.h # grope.h
- cat - << \SHAR_EOF > README
-
- This is a suite of programs for working with words. I am releasing
- it to alt.sources primarily to invite people to test it on as many
- platforms as possible (mail me back any bug reports please!) so that
- the final code will be completely portable.
-
- Unpack with unshar or sh < part1; sh < part2; sh < part3
-
- Utilities for spelling checking and correction are supplied, but
- the ultimate aim is to support all sorts of word-manipulation activities
- such as a writers workbench and assorted word games (See PS).
-
- This news posting is in three parts; part1: this file + headers;
- part2: utility modules in .c files to be #include'd; part3: main
- programs; just CC these and they should work -- no fancy makefiles
- or link commands.
-
- The code here doesn't have much of a user interface; I'm rather hoping
- that people who pick it up will hook it in to their favourite operating
- system as smoothly as they can. Perhaps someone with the time to
- spare (who am I kidding :-( ) might integrate it with ispell for instance.
-
- I've experimented with various interfaces already; I can accept TeX
- input and write output suitable for correcting text in either emacs
- or uEmacs. I'm not going to release these though, until I'm happy
- with the internal code. Incidentally, the code will be totally
- public domain - meaning that I've no objection to its being used
- in commercial projects. HOWEVER I would appreciate if you didn't do so
- until it is officially released (probably through comp.sources.misc)
- because I would prefer to define portable file-formats and magic numbers
- first.
-
- OK, enough babble; here's how it all works:
-
- 1) acquire a list of words for yourself; I have one but at 690K it is
- too big to post. (actually even these sources come to around 120K
- so I hope they make it through OK. I don't think I have a shar
- that splits, so I've divided into three shars myself.)
- The word list should be sorted by ascii order.
-
- 2) cc dawg.c
- I've deliberately made main programs into one source file, by
- including *.c for utility modules; I know this is bad style but
- it helps make compiling and porting easier (no worries about
- long external names for instance, or non-case-sensitive linkers)
-
- 3) dawg yourdict dict
- 'yourdict' is your wordlist; but the second parameter should
- be 'dict' at least while testing. This will generate dict.dwg
- which is a compacted and *FAST* data structure for word access.
-
- Building a dawg[See next section for meaning of name] from a word
- list is a real memory pig; so I've written code which will let you
- generate the dawg in chunks (traded off against a small loss of
- compression density). (Interestingly enough it isn't a time-pig;
- using a hash-table for node merging gives almost linear time - thanks
- to Richard Hooker for that one).
-
- If you don't have enough memory (say 2Mb?) you'll get a run-time
- message suggesting that you recompile with cc -DSMALL_MEMORY dawg.c
- (OK, so one day I'll make it select this mode itself at run-time)
-
- IBM PC's and Mac's will get this mode by default. [See FOOTNOTE]
-
- If you want to check that the data structure generated is ok,
- 3a) cc pdawg.c
- 3b) pdawg dict.dwg
- This should decompile the data and print out the
- original word list
-
- 4) cc dwgcheck.c
- Compile a Mickey Mouse spelling checker
-
- 5) dwgcheck a few wurdz on the comand line ?
- ... and test it.
-
- 6) cc tell.c
- Now try the 'advanced' stuff! - soundex correction... (I hates it :-()
-
- 7) tell me sume wrongli speld werds
- and test it...
-
- If you're getting into this stuff, here's another incidental
- hack you might want to try...
- 7a) cc proot.c
- 7b) proot root
- print out all words of form 'root*'
- One day I'll hook this stuff into regexp; but not today :-)
-
- ----------
-
- Enough examples for now. If that all worked you are doing well!
- If it didn't, and you have the time, please find out what went wrong;
- If you can fix it, mail me the fix please, else mail me a log of
- the symptoms, such as compiler error messages.
-
- By the way, I know that the spelling correction uses a tacky algorithm;
- don't worry about it -- I'm working on some red hot stuff which will
- put soundex to shame (which isn't hard :) ) [Late note; it now works
- (apparently well) but I need a phonetic dictionary before it is useful :-(
- Anyone got a phonetic dictionary? Alvey people out there?]
-
- You might be interested in the data structures used; the main one is
- a dawg - a 'Directed Acyclic Word Graph' -- it's essentially a
- Directed Acyclic Graph implementing an FSM which describes the set of
- all words in your lexicon. The name DAWG comes from Appel's in his paper
- on Scrabble (although none of the code or algorithms do).
-
- The second most important data structure is superimposed on this
- one; it is a packing algorithm due to Liang (of TeX hyphenation
- fame) which allows you to convert an implicitly linked trie
- into an indexed trie *without* taking any more space. Liang is
- a real smart cookie & it is a shame this technique of his is
- not better known... (it should be up there among binary tree and
- linked lists!)
-
- OK, more examples:
-
- 8) cc pack.c
- compile the packing code
-
- 9) pack dict.dwg dict.pck
- this takes rather a long time; you'll get messages saying '1000 nodes'
- '2000 nodes' etc roughly 1 every 20 seconds +/- 50%. There shouldn't
- be more than a couple of screenfuls of these :-) I'll speed it up one
- day.
-
-
- If you want to check that this data structure generated is ok too,
- 9a) cc ppack.c
- 9b) ppack dict.pck
- Just as you did for (3)
-
- 10) cc pckcheck.c
- Just like dwgcheck but using the other type of data.
-
- 11) pckcheck sume moar possibly incorrectly speld woards
- boring, huh?
-
- 12) Tha Tha Thats all folks!
-
- ---------------
-
- Unless I've made some major cockup at the last minute, you should
- actually have a reasonable chance of getting these working on your
- machine, whatever it is. (Dec 20's and EBCDIC machines *possibly*
- excepted - attempts welcome!)
-
- As I said, please mail me your bugfixes, problems, suggestions etc.
- By the way, the standard of coding isn't exceptionally high; this
- implementation was always intended to be a prototype. A rewrite
- of dawg.c and pack.c is definitely high on the priority list...
- The actual application programs aren't too bad, but the interfaces
- to the various utility routines are liable to change on an hour-by-hour
- basis! (I've been trying to make them more consistent -- you should have
- seen the last lot ;-) ).
-
- However, the algorithms are all complete now; anyone who has ever needed
- to convert a tree to a dag might be interested in the linear-time solution
- in dawg.c - most other solutions I've seen have been NlogN or N^2.
-
- If you're interested, there's a file dictlib.h which is a proposed interface
- for the next iteration; comments are invited.
-
- Share & Enjoy,
-
- Graham Toal <gtoal@uk.ac.ed>
-
- <FOOTNOTE>
- I'm experimenting in this code with ways of finding out at compile
- time various parameters needed for portability. There's a file
- grope.h which does this. If things all work first time, great;
- if not, you may have to change various #define's. The best way
- to do that is to add code to grope.h which identifies your system,
- and only make changes of the form:
-
- #ifdef SYS_MYSYSTEM
- #undef something
- #define something (a different value)
- #endif /* my system */
-
- There are instructions for customising in grope.h itself. My overall
- aim is to avoid special makefiles and special command line options.
- (A bit ambitious I know, but nice idea if it works...)
- (Steve Pemberton's config.c might be useful to you too if you are
- hacking grope.h. It was reposted on usenet recently.)
-
- </FOOTNOTE>
-
- <PS>
-
- Future hacks:
- 1) Anagrams
- 2) Crosswords
- 3) Scrabble
- 4) Anything where a text key can be used to lookup another
- piece of text. This is implemented with the 'dawg_locate_prefix'
- routine; it is effectively a cheap substitute for btrees etc.
- (and better than a hash table because you can *do* things with it!)
-
- 4a) Reverse phone book 1234567=Fred Bloggs
- 4b) house style enforcer stupid=infelicitous
- 4c) uk/us spelling advisor sidewalk=pavement
- 4d) shorthand/macro expander f.y.i.=for your information
- etc etc... (Most of wwb in fact)
-
- 5) Anything you can suggest! (or *implement* :-) )
-
- </PS>
-
- <MEMO>
-
- Archive-name: dawgutils/part[1-3].shar
- Reply-to: Graham Toal <gtoal%uk.ac.ed@nsfnet-relay.ac.uk>
-
- 1) Upload fixed version of dyntrie.c - remember bug with signed chars
- being used as an index [comment rule in dawg.c?]
-
- 2) known design flaw: can't handle strings with 0 in them
-
- 3) known bug: dyntrie.c has problems if you enter a string
- which *starts* with 255. This is due to sending 'end_of_node'
- bit rather hackily. Can be fixed.
-
- 4) Test version to be posted to alt.sources by running on machine
- with signed chars (eg MSDOS)
-
- 5) Remove hacky Malloc debugging macros in utils.c -- there might be
- a problem if compiled on MSDOS but not under MICROSOFT C
-
- 6) Check that LIB_STRINGS is *not* defined when UX63 is set.
-
- 7) tweak the constants in pack.c to speed it up a lot
-
- 8) hack the pack.c heuristics into dyntrie.c to speed it up too.
-
- </MEMO>
- SHAR_EOF
- cat - << \SHAR_EOF > copyr.h
- /*
- * Copyright 1989 Graham Toal & Richard Hooker
- *
- * Permission to use, copy, modify, and distribute this software and its
- * documentation for any purpose with or without fee is hereby granted provided
- * that the above copyright notice appear in all copies and that both that
- * copyright notice and this permission notice appear in supporting
- * documentation.
- *
- * This program is publicly available, but is NOT in the public domain. The
- * difference is that copyrights granting rights for unrestricted use and
- * redistribution have been placed on the software to identify its
- * authors. You are allowed and encouraged to take this software and
- * build commercial products, freeware products, shareware products, and
- * any other kind of product you can think up, with the following restriction:
- *
- * If you redistribute the source to this program, or a derivitive of that
- * source, you must include this copyright notice intact. If the system
- * this source is distributed with is under a stricter license (such as
- * a commercial license, the typical freeware "no commercial use" license,
- * or the FSF copyleft) then you must provide the original source under the
- * original terms.
- *
- * Edinburgh Software Products (E.S.P.) makes no representations about the
- * suitability of this software for any purpose. It is provided "as is"
- * without express or implied warranty.
- *
- * E.S.P. DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
- * ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
- * E.S.P. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
- * ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
- * IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
- * OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
- *
- * Authors: Graham Toal & Richard Hooker, Edinburgh Software Products,
- * 6 Pypers Wynd, Prestonpans, East Lothian, Scotland.
- */
- SHAR_EOF
- cat - << \SHAR_EOF > dawg.h
- /**
- *
- * DAWG.H
- *
- * Header file for Directed Acyclic Word Graph access
- *
- * The format of a DAWG node (8-bit arbitrary data) is:
- *
- * 31 24 23 22 21 0
- * +--------------------+---+---+--+-------------------------------------+
- * | Letter | W | N |??| Node pointer |
- * +--------------------+---+---+--+-------------------------------------+
- *
- * where N flags the last edge in a node and W flags an edge as the
- * end of a word. 21 bits are used to store the node pointer, so the
- * dawg can contain up to 262143 edges. (and ?? is reserved - all code
- * generating dawgs should set this bit to 0 for now)
- *
- * The root node of the dawg is at address 1 (because node 0 is reserved
- * for the node with no edges).
- *
- * **** PACKED tries do other things, still to be documented!
- *
- **/
-
- #define TRUE (0==0)
- #define FALSE (0!=0)
-
- #define DAWG_MAGIC 0x01234567
- #define PACK_MAGIC 0x34567890
-
- #define TERMINAL_NODE 0
- #define ROOT_NODE 1
-
- #define V_END_OF_WORD 23
- #define M_END_OF_WORD (1L << V_END_OF_WORD)
- #define V_END_OF_NODE 22 /* Bit number of N */
- #define M_END_OF_NODE (1L << V_END_OF_NODE) /* Can be tested for by <0 */
-
-
- #define V_FREE 22 /* Overloading this bit, as
- packed tries need no end-of-node */
- #define M_FREE (1L << V_FREE)
-
- #define V_LETTER 24
- #define M_LETTER 0xFF
- #define M_NODE_POINTER 0x1FFFFFL /* Bit mask for node pointer */
-
- /* max_chars and base_chars are a hangover from the days when this
- was trying to save space, and dawgs were only able to contain
- characters in the range 'a'..'z' 'A'..'Z' (squeezed into 6 bits).
- Now that we're '8-bit clean' we no longer need these. Later code
- in fact doesn't support the old style; but some procedures still
- subtract 'BASE_CHAR' from the character to normalize it. Since it
- is now 0 it is a no-op... */
- #define MAX_CHARS 256
- #define BASE_CHAR 0
-
- /* Enough? */
- #define MAX_WORD_LEN 256
-
- #define MAX_LINE 256
-
- typedef long NODE;
- typedef long INDEX;
-
- SHAR_EOF
- cat - << \SHAR_EOF > dictlib.h
- /* This header file does not describe any of the code in the
- accompanying files; it is a rough sketch of the NEXT iteration
- of the spelling/word utilities. Comments are invited. Missing
- functionality or bits that won't work should be pointed out
- please! Send mail to gtoal@ed.ac.uk (via nsfnet-relay.ac.uk
- from US sites)
- many thanks.
-
- Graham Toal 23/06/90
- */
-
-
-
-
-
-
- typedef long INDEX;
- typedef long NODE;
- #define ROOTNODE 1L
-
- typedef struct DICT {
-
- /* Although primitive procedures will be provided for handling the
- basic dawg array, by using this interface applications can
- remain independent of the implementation details of the
- dictionary. */
-
- /* So far there are three forms of dict; 1: a simple dawg with edges
- as 4-byte integers; each node (a set of branches) is stored
- sequentially. 2: An *indexed* form of the above -- much faster
- to lookup in, but slower to walk through. Although indexing
- would normally be heavy on space, this uses Liang's packed
- overlapping tries, thus using almost exactly the same space as
- method 1. 3: A form of 1, but with all the stops pulled out
- to get as much bit-level compaction as possible; edges can
- be two bytes instead of 4. (by Keith Bostic) */
-
- /* This struct contains both access procedures and information;
- most fields will be filled in by our code when the dictionary
- is loaded by load_dict(). The fields may be added to in
- later releases, but they will always be upwards compatible;
- none of this data is saved to disk in this format, so changing
- the struct layout will not cause problems as long as the names
- remain the same. */
-
- /* Note that some procedures take a user-supplied 'apply' parameter;
- This 'apply' procedure has a 'context' parameter which is merely
- a handle to allow the user to pass data in to his code without
- having to use global variables. If the user's apply procedure
- returns anything other than 0, it will cause an early exit, with
- the users return code being returned from the calling function. */
-
- /* It is intended that these are used in an object-oriented way;
- although we will supply an external dict_check(word, dict) procedure, for
- instance, all it will do is call dict->check(word, dict->dawg) */
-
- NODE *dawg;
- /* Root of the dictionary tree */
- char *dict_name, /* "smiths" - used in command lines etc. */
- *dict_info, /* "Johns Smiths Rhyming Dictionary,\n
- (c) Smith & Jones, 1892\n
- Whitechapel, London.\n" */
- *dict_fname, /* "/usr/dict/smiths" */
-
- *lang_charset; /* "ISO 8859/1 Latin 1" */
-
- /* These are filled in by us to make life easier for the applications
- programmer. We'll supply a default table for simple 7-bit ascii
- dawgs; but this mechanism allows us the option of including all
- the salient points about the character set used in a dictionary
- in the dictionary itself. Because they are char*'s, we'll probably
- end up sharing the same table between multiple dictionaries --
- so don't write to them. (Although you may replace the _pointer_
- with one to a table of your own. */
-
- /* Later note: I've just found out about LOCALE.H etc. This stuff
- may therefore change... */
-
- int *chartype;
- /* 1 whitespace */
- /* 2 punctuation */
- /* 4 blank */
- /* 8 lower case letter */
- /* 16 upper case letter */
- /* 32 (decimal) digit */
- /* 64 underscore */
- /* 128 A-F and a-f */
- /* 256 Ignore words starting
- with this char */
- /* 512 Include in definition
- of a word -- mainly
- alpha but also hyphen
- and apostrophe */
-
- char *lower, /* personal implementation of tolower() */
- *unaccented, /* map accented chars to non-accenmted */
- *lexorder; /* an arbitrary ordering for sorting:
- e-acute would come after e and before f
- */
-
- int (* check) (NODE *base, INDEX state, struct DICT *d, char *word);
- /* Check a word - exact match; return true/false */
-
- int (* fuzzy) (NODE *base, INDEX state, struct DICT *d, char *word,
- char *value);
- /* Check a word - fuzzy case, accents, hyphens etc; return true/false */
- /* value is filled in with the dictionary's idea of what the
- word should look like. */
-
- /* Fancier correction procedures may be synthesised out of user code and
- 'walk' */
- int (* correct) (
- char *word,
- NODE *base, INDEX state,
- /* Normally dict's root, but may be subtree */
- struct DICT *d, /* lowercase/accent information from here */
- int maxwords,
- int order, /* 0: by alpha
- 1: by highest score first
- 2: by lexical ordering
- (ignores case, accents, hyphens)
- */
- int (* apply) (
- char *word, /* Corrected word */
- int score, /* Normalised to range 0..255 */
- void *context),
- void *context);
- /* Offer spelling corrections for the
- given word. If maxwords > 0, return
- the best <maxwords> words in order
- of most-likely first; otherwise
- many words may be returned, in
- alphabetical order, coupled with
- a score: 255 means exact match; 0
- means totally different */
-
- /* returns 0 if user returned 0 for
- all applications */
- int (* walk) (
- NODE *base, INDEX state,
- int (* apply) (char *word, void *context),
- void *context);
- /* Apply the procedure given to every
- word in the dictionary tree or subtree,
- passing in the user-supplied context */
-
- /* returns 0 if user returned 0 for
- all applications */
-
- int (* lookup) (
- NODE *base, INDEX state,
- char *key,
- /* Out */
- INDEX values);
- /* Return the index of the subtree of the
- dictionary which has 'key' as a prefix;
- for instance, in a reverse telephone
- book, you might find entries:
- 0874=Anytown
- 0875=Cockenzie
- 0875=Port Seton
- 0875=Prestonpans
- 0876=Toytown
- In this case, if searching for "0875=",
- a subtree would be returned containing:
- Cockenzie
- Port Seton
- Prestonpans
- This subtree would then be walked
- using the 'walk' function and a user-
- written 'apply' procedure. */
-
- /* returns 0 if successful */
-
- /* The following two are performance hints ONLY. They should not
- affect the correct functioning of a program. */
-
- int (* uncache) (struct DICT *d);/* The space a dictionary uses may be
- reclaimed by calling this. If the
- dictionary is subsequently accessed,
- there *may* be a performance penalty --
- for instance the dictionary may be
- accessed off disk. However on a machine
- with sufficient memory, the system
- may choose to leave the data in ram
- until, say, malloc runs out of space.
- If there is no convenient mechanism
- for organising the demand-recacheing,
- this may be a no-op. */
-
- /* returns 0 if successful */
-
- int (* recache) (struct DICT *d);/* The data of the dictionary is reloaded
- into active memory where it will stay */
-
- /* returns 0 if successful */
- /* May be extended in the future */
- } DICT;
-
-
-
- /*
- dict_load
-
- This is provided as a primitive so that dictionaries can be loaded
- in the most efficient way the machine supports. The space for the dict
- is supplied *by* this procedure - not *to* it. This allows a Mach (or Emas)
- implementation to mmap (or Connect) the file in memory - the connection
- address being chosen by the OS and outside our control. It also allows
- systems like RISC OS on the Acorn Archimedes to use an optimised whole-
- file load call to pull the file off disk into real ram in a single disk
- operation.
-
- Since the data parts of the type 1 & 2 dicts are just a set of 4-byte
- integers, it is easy to correct a faulty byte-sex by running down this
- array *once at start-up* to reverse the order. This is easy if the
- data is loaded into physical ram. If it is connected as a file by a VM system
- however, the VM system must support copy-on-write; if it only has read-only
- mapping there must be two files available -- one of each sex.
-
- A *possible* scheme on VM systems is to byte-sex-reverse a whole
- page the first time that page is touched. This may be a lot of unneccesary
- work - I'd recommend keeping a correct-sex file on-line.
-
- This code will create the DICT struct, and fill in the various fields,
- either from the dictionary file itself or from private knowledge if not available.
-
- */
-
- int dict_load_file(char *fname, DICT **dict_struct);
-
- /* fname is the literal file name */
-
- /* dict_struct is allocated by us,
- not by the user. */
-
- /* returns 0 if successfully loaded */
-
- int dict_load_dict(char *dname, DICT **dict_struct);
-
- /* dname is the dictionary name,
- e.g. "words", or "web2" etc.
- The corresponding file will be searched
- for via a system-dependent path, logical
- variable, or fixed place. */
-
- /* dict_struct is allocated by us,
- not by the user. */
-
- /* returns 0 if successfully loaded */
-
- int dict_unload(DICT *d); /* Use when totally finished with data */
-
- /* These are user-level veneers for the internal procedures used above.
- Use them in preference to the above. Use the above only if
- you are a speed-freak! (but disguise them in macros such as:
-
- #define dict_check(dawg, dict, word) dict->check(dict->dawg, dict, word)
-
- These macros will of course go 'Bang!' if dict ptr is unassigned or NULL.
- */
-
- /* In the worst case, if a machine has a poor quality compiler which
- doesn't support procedure variables well, these routines could be
- the *acual* implementation rather than just a pointer to it. */
-
- int check (NODE *base, INDEX state, DICT *d, char *word);
- int fuzzy (NODE *base, INDEX state, DICT *d, char *word, char *value);
- int correct (char *word,
- NODE *base, INDEX state,
- /* Normally dictionary root, but may be subtree */
- DICT *d, /* lowercase/accent information from here */
- int maxwords,
- int order, /* 0: by alpha
- 1: by highest score first
- 2: by lexical ordering
- */
- int apply(char *word, /* Corrected word */
- int score, /* Normalised to range 0..255 */
- void *context),
- void *context);
- int walk (NODE *base, INDEX state, /* Yoyo fanatics will be pleased
- to learn that they can 'walk the dawg'... :-) */
- int apply(char *word, void *context),
- void *context);
- int lookup (NODE *base, INDEX state,
- char *key,
- /* Out */
- INDEX values);
-
-
- /* Now here come the fast internal procedures which the above eventually call */
- int dawg_check(NODE *base, INDEX state, char *word);
- /* Standard dawg */
- int pack_check(NODE *base, INDEX state, char *word);
- /* Overlapped indexed dawg */
- int bsd_check(NODE *base, INDEX state, char *word);
- /* Space-optimised dawg (called bsd because Keith Bostic is working
- on this format for the next bsd4.4 spelling checker) */
- /* Must make the internl procs agree with these... save the macros for later...
- #define dict_check(word,dict) \
- (dict != NULL ? (dict->check(dict->dawg, ROOTNODE, word)) : -1)
- */
- int dawg_walk(NODE *base, INDEX state,
- int apply(char *word, void *context),
- void *context);
- int pack_walk(NODE *base, INDEX state,
- int apply(char *word, void *context),
- void *context);
- int bsd_walk(NODE *base, INDEX state,
- int apply(char *word, void *context),
- void *context);
- /*
- #define dict_walk(dict, apply, context) \
- (dict != NULL ? (dict->walk(dict->dawg, ROOTNODE, apply, context)) : -1)
- */
-
- int dawg_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
- int pack_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
- int bsd_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
- /* etc... */
-
- /* Get list of possible next characters from this point in the tree,
- put it into user-supplied array branch[256]. Filled with
- NULL if no branch, or an INDEX if it points to more of the tree.
- Words which may terminate on a particular letter have terminate[let] != 0
-
- The user-supplied terminate array is deliberately a long array to allow
- for expansion; it may be used one day to return arbitrary codes such as
- 'Noun' etc...
-
- */
-
- void dawg_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
- void pack_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
- void bsd_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
- /* No internal proc yet... add it in the morning. */
-
- /* In fact there will be even simpler procedures which can be used by
- people who prefer to include the whole source of this package rather
- than just the library interface parts. In applications where utmost
- speed is a neccessity (eg Scrabble(tm)) you could use these, or write
- your own. But most of us can use this clean interface with unnoticable
- overhead. */
- SHAR_EOF
- cat - << \SHAR_EOF > grope.h
- #ifndef grope_h
- #define grope_h 1
-
- #ifdef TESTING_DEFS
- /*###################################################################*/
- /*
- * This is an exercise to assist in writing portable programs.
- *
- * To use as a header file, eg "grope.h", just leave this file
- * as it is. To use it to define new entries, rename it as
- * "grope.c" and compile it as "cc -DTESTING_DEFS grope".
- *
- * To customise this test program for your system, I have set it up
- * so that all features are enabled. If you find that one feature
- * causes a compile-time error, test a suitable set of '#ifdef's for
- * your machine and #undef the values below which are not appropriate.
- *
- * Please do your best to see that your tests are unique, and that
- * you do not undo any tests already implemented.
- *
- * One last request; PLEASE PLEASE PLEASE send me any updates you
- * may make. If you cannot get through to me on email, send me a
- * paper copy (or even better, a floppy copy :-)) to Grahan Toal,
- * c/o 6 Pypers wynd, Prestonpans, East Lothian, Scotland EH32 9AH.
- *
- * Graham Toal <gtoal@uk.ac.ed>
- * [PS: I can read DOS and RISC OS floppies only]
- */
- #define STDLIBS 1
- #define PROTOTYPES 1
- #define KNOWS_VOID 1
- #define KNOWS_LINE 1
- #define KNOWS_REGISTER 1
- #endif /* TESTING_DEFS */
- /*###################################################################*/
- #define SYS_ANY 1
- #define COMPILER_ANY 1
- /* Don't yet know what machine it is. Add a test once identified. */
- /*--------------------------*/
- #ifdef MSDOS
- #define SYS_MSDOS 1
- #endif
- /*--------------------------*/
- #ifdef __STDC__
- #define STDLIBS 1
- #define PROTOTYPES 1
- #define KNOWS_VOID 1
- /* If the compiler defines STDC it should have these. We can undef
- them later if the compiler was telling lies! */
- #endif
- /*--------------------------*/
- #ifdef MPU8086
- #define SYS_MSDOS 1
- /* Aztec does not #define MSDOS.
- We assume it is Aztec on MSDOS from the MPU8086.
- */
- #ifdef __STDC__
- #define COMPILER_AZTEC 1
- /* Aztec is known to also define __STDC__ (illegally). Therefore if
- __STDC__ is not defined, it is probably not Aztec */
- #endif
- #endif
-
- #ifdef SYS_MSDOS
- /*--------------------------*/
- #ifdef __STDC__
- /*----------------------*/
- #define COMPILER_MICROSOFT 1
- /* I assume that the combination of #define MSDOS and #define __STDC__
- without any other collaboration means MICROSOFT. (Who, incidentally,
- are being naughty by declaring __STDC__) */
- #define KNOWS_LINE 1
- #else
- /*----------------------*/
- #ifdef __ZTC__
- /*------------------*/
- #define COMPILER_ZORTECH 1
- #undef STDLIBS
- /* Another system without locale.h */
- #define PROTOTYPES 1
- #else
- /*------------------*/
- /* A non-std msdos compiler */
- #undef STDLIBS
- #undef PROTOTYPES
- /*------------------*/
- #endif
- /*----------------------*/
- #endif
- /*--------------------------*/
- #endif
- #ifdef TURBOC
- /* Although Turbo C correctly does not define __STDC__, it has SOME
- standard libraries (but not all - locale.h?) and accepts prototypes. */
- #undef STDLIBS
- #define PROTOTYPES 1
- #define SYS_MSDOS 1
- #define COMPILER_TURBOC 1
- /* TURBOC is defined, but has no value. This allows it to be tested
- with #if as well as #ifdef. */
- #endif
- /*--------------------------*/
- #ifdef COMPILER_MICROSOFT
- #undef STDLIBS
- /* Again, like Turbo C, does not know locale.h */
- #define PROTOTYPES 1
- #endif
- /*--------------------------*/
- #ifdef SYS_MSDOS
- #define MEMMODELS 1
- /* We assume ALL MSDOS machines use memory models */
- #endif
- /*--------------------------*/
- #ifdef UX63
- #undef STDLIBS
- #undef PROTOTYPES
- #define SYS_UX63 1
- #define COMPILER_PCC 1
- /*#define LIB_STRINGS 1 - apparently not... */
- #endif
- /*--------------------------*/
- #ifdef sun
- #undef STDLIBS
- #undef PROTOTYPES
- #define SYS_SUN 1
- #define COMPILER_PCC 1
- #endif
- /*--------------------------*/
- #ifdef THINK_C
- #define SYS_MAC 1
- #define COMPILER_THINKC 1
- #define KNOWS_VOID 1
- #endif
- /*--------------------------*/
- #ifdef sparc
- #undef STDLIBS
- #undef PROTOTYPES
- #define SYS_SPARC 1
- #define COMPILER_PCC 1
- #endif
- /*--------------------------*/
- #ifdef ARM
- #define SYS_RISCOS 1
- /* I fear Acorn may define 'ARM' on both unix and risc os versions.
- Should find out whether they define others as well, to differentiate. */
- #endif
- #ifdef __ARM__
- #define SYS_RISCOS 1
- /* I fear Acorn may define 'ARM' on both unix and risc os versions.
- Should find out whether they define others as well, to differentiate. */
- #endif
- /*--------------------------*/
- #ifdef SYS_RISCOS
- #define COMPILER_NORCROFT 1
- #define KNOWS_REGISTER 1
- #define KNOWS_LINE 1
- #endif
- /*--------------------------*/
- #ifdef vms
- #define SYS_VMS 1
- #endif
- /*--------------------------*/
-
- /*--------------------------*/
- #ifdef SYS_UX63
- #undef SYS_ANY
- #endif
- #ifdef SYS_ARM
- #undef SYS_ANY
- #endif
- #ifdef SYS_MSDOS
- #undef SYS_ANY
- #endif
- #ifdef SYS_SUN
- #undef SYS_ANY
- #endif
- #ifdef SYS_SPARC
- #undef SYS_ANY
- #endif
- #ifdef SYS_RISCOS
- #undef SYS_ANY
- #endif
- #ifdef SYS_MAC
- #undef SYS_ANY
- #endif
- #ifdef SYS_VMS
- #undef SYS_ANY
- #endif
- /*--------------------------*/
- #ifdef COMPILER_PCC
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_MICROSOFT
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_TURBOC
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_ZORTECH
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_AZTEC
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_NORCROFT
- #undef COMPILER_ANY
- #endif
- #ifdef COMPILER_THINKC
- #undef COMPILER_ANY
- #endif
- /*--------------------------*/
- /* ##################################################################### */
- #ifdef TESTING_DEFS
- #include <stdio.h>
- /* ======================================================================= */
- #ifdef STDLIBS
- /* If any of these fail, make sure STDLIBS is not
- defined for your machine. */
- #include <stdlib.h> /* STDLIBS should not be defined. */
- #include <time.h> /* STDLIBS should not be defined. */
- #include <locale.h> /* STDLIBS should not be defined. */
- #endif
- /* ======================================================================= */
- #ifdef KNOWS_VOID
- void test() { /* KNOWS_VOID should not be defined */
- /* Make sure your ifdef's don't #define KNOWS_VOID if this fails */
- }
- #endif
- /* ======================================================================= */
- #ifdef KNOWS_REGISTER
- int regtest() {
- register int i = 0; /* KNOWS_REGISTER should not be defined */
- /* Make sure your ifdef's don't #define KNOWS_REGISTER if this fails */
- return(i);
- }
- #endif
- /* ======================================================================= */
- #ifdef PROTOTYPES
- int main(int argc, char **argv) /* PROTOTYPES should not be defined */
- /* If this fails, make sure PROTOTYPES is not defined for your machine. */
- #else
- int main(argc,argv)
- int argc;
- char **argv;
- #endif
- {
- /*-------------------------------------------------------------------------*/
- printf("We should know just what the machine is, or 'SYS_ANY':\n");
- #ifdef SYS_UX63
- printf("#define SYS_UX63 %d\n", SYS_UX63);
- #endif
- #ifdef SYS_ARM
- printf("#define SYS_ARM %d\n", SYS_ARM);
- #endif
- #ifdef SYS_MSDOS
- printf("#define SYS_MSDOS %d\n", SYS_MSDOS);
- #endif
- #ifdef SYS_SUN
- printf("#define SYS_SUN %d\n", SYS_SUN);
- #endif
- #ifdef SYS_SPARC
- printf("#define SYS_SPARC %d\n", SYS_SPARC);
- #endif
- #ifdef SYS_RISCOS
- printf("#define SYS_RISCOS %d\n", SYS_RISCOS);
- #endif
- #ifdef SYS_MAC
- printf("#define SYS_MAC %d\n", SYS_MAC);
- #endif
- #ifdef SYS_VMS
- printf("#define SYS_VMS %d\n", SYS_VMS);
- #endif
- #ifdef SYS_ANY
- printf("#define SYS_ANY %d\n", SYS_ANY);
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the machine has different memory models or not:\n");
- #ifdef MEMMODELS
- printf("#define MEMMODELS %d\n", MEMMODELS);
- #else
- printf("#undef MEMMODELS\n");
- #endif
- /*-------------------------------------------------------------------------*/
- printf("One compiler name should be given, or 'COMPILER_ANY':\n");
- #ifdef COMPILER_PCC
- printf("#define COMPILER_PCC %d\n", COMPILER_PCC);
- #endif
- #ifdef COMPILER_MICROSOFT
- printf("#define COMPILER_MICROSOFT %d\n", COMPILER_MICROSOFT);
- #endif
- #ifdef COMPILER_TURBOC
- printf("#define COMPILER_TURBOC %d\n", COMPILER_TURBOC);
- #endif
- #ifdef COMPILER_ZORTECH
- printf("#define COMPILER_ZORTECH %d\n", COMPILER_ZORTECH);
- #endif
- #ifdef COMPILER_AZTEC
- printf("#define COMPILER_AZTEC %d\n", COMPILER_AZTEC);
- /* Can exist on other machines as well as MSDOS */
- #endif
- #ifdef COMPILER_LATTICE
- /* Currently coming through as 'COMPILER_ANY' - haven't found one to test */
- /* Can exist on other machines as well as MSDOS. Meanwhile pass it in */
- /* on the command_line? */
- printf("#define COMPILER_LATTICE %d\n", COMPILER_LATTICE);
- #endif
- #ifdef COMPILER_GCC
- /* Ditto. Test in appropriate places for each machine. */
- printf("#define COMPILER_GCC %d\n", COMPILER_GCC);
- #endif
- #ifdef COMPILER_NORCROFT
- printf("#define COMPILER_NORCROFT %d\n", COMPILER_NORCROFT);
- #endif
- #ifdef COMPILER_THINKC
- printf("#define COMPILER_THINKC %d\n", COMPILER_THINKC);
- #endif
- #ifdef COMPILER_ANY
- printf("#define COMPILER_ANY %d\n", COMPILER_ANY);
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the compiler accepts ANSI-like libraries or not:\n");
- #ifdef STDLIBS
- printf("#define STDLIBS %d\n",STDLIBS);
- #else
- printf("#undef STDLIBS\n");
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the machine accepts ANSI prototypes or not:\n");
- #ifdef PROTOTYPES
- printf("#define PROTOTYPES %d\n", PROTOTYPES);
- #else
- printf("#undef PROTOTYPES\n");
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the machine needs nonstandard #include <strings.h> or not:\n");
- #ifdef LIB_STRINGS
- printf("#define LIB_STRINGS %d\n", LIB_STRINGS);
- #else
- printf("#undef LIB_STRINGS\n");
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the machine accepts the 'void' keyword or not:\n");
- #ifdef KNOWS_VOID
- printf("#define KNOWS_VOID %d\n", KNOWS_VOID);
- #else
- printf("#undef KNOWS_VOID\n");
- #endif
- printf("Either the machine accepts the 'register' keyword or not:\n");
- /*-------------------------------------------------------------------------*/
- printf("Either the compiler accepts the '__LINE__' directive or not:\n");
- #ifdef KNOWS_LINE
- printf("#define KNOWS_LINE %d\n", KNOWS_LINE);
- #else
- printf("#undef KNOWS_LINE\n");
- #endif
- /*-------------------------------------------------------------------------*/
- printf("Either the machine accepts the 'register' keyword or not:\n");
- #ifdef KNOWS_REGISTER
- printf("#define KNOWS_REGISTER %d\n", KNOWS_REGISTER);
- #else
- printf("#undef KNOWS_REGISTER\n");
- #endif
- /* Note - this is intended to be used only if your compiler accepts
- 'register' *AND* compiles code with it correctly! Some compilers
- show up bugs when register optimisations are attempted! */
- /*-------------------------------------------------------------------------*/
- if (argv[argc] != 0) printf("Warning! argv[argc] != NULL !!! (Non ansii)\n");
- }
- #endif /* TESTING_DEFS */
- #endif /* Already seen? */
-
- SHAR_EOF
-
-
-