Garbo

home *** CD-ROM | disk | FTP | other *** search

/ Garbo / Garbo.cdr / pc / source / spell.lzh / spell.1 next >

Wrap

Text File | 1990-07-03 | 40.9 KB | 1,072 lines

#!/bin/sh-----cut here-----cut here-----cut here-----cut here----- # shar: Shell Archiver # Run the following text with /bin/sh to create: # README # copyr.h # dawg.h # dictlib.h # grope.h cat - << \SHAR_EOF > README This is a suite of programs for working with words. I am releasing it to alt.sources primarily to invite people to test it on as many platforms as possible (mail me back any bug reports please!) so that the final code will be completely portable. Unpack with unshar or sh < part1; sh < part2; sh < part3 Utilities for spelling checking and correction are supplied, but the ultimate aim is to support all sorts of word-manipulation activities such as a writers workbench and assorted word games (See PS). This news posting is in three parts; part1: this file + headers; part2: utility modules in .c files to be #include'd; part3: main programs; just CC these and they should work -- no fancy makefiles or link commands. The code here doesn't have much of a user interface; I'm rather hoping that people who pick it up will hook it in to their favourite operating system as smoothly as they can. Perhaps someone with the time to spare (who am I kidding :-( ) might integrate it with ispell for instance. I've experimented with various interfaces already; I can accept TeX input and write output suitable for correcting text in either emacs or uEmacs. I'm not going to release these though, until I'm happy with the internal code. Incidentally, the code will be totally public domain - meaning that I've no objection to its being used in commercial projects. HOWEVER I would appreciate if you didn't do so until it is officially released (probably through comp.sources.misc) because I would prefer to define portable file-formats and magic numbers first. OK, enough babble; here's how it all works: 1) acquire a list of words for yourself; I have one but at 690K it is too big to post. (actually even these sources come to around 120K so I hope they make it through OK. I don't think I have a shar that splits, so I've divided into three shars myself.) The word list should be sorted by ascii order. 2) cc dawg.c I've deliberately made main programs into one source file, by including *.c for utility modules; I know this is bad style but it helps make compiling and porting easier (no worries about long external names for instance, or non-case-sensitive linkers) 3) dawg yourdict dict 'yourdict' is your wordlist; but the second parameter should be 'dict' at least while testing. This will generate dict.dwg which is a compacted and *FAST* data structure for word access. Building a dawg[See next section for meaning of name] from a word list is a real memory pig; so I've written code which will let you generate the dawg in chunks (traded off against a small loss of compression density). (Interestingly enough it isn't a time-pig; using a hash-table for node merging gives almost linear time - thanks to Richard Hooker for that one). If you don't have enough memory (say 2Mb?) you'll get a run-time message suggesting that you recompile with cc -DSMALL_MEMORY dawg.c (OK, so one day I'll make it select this mode itself at run-time) IBM PC's and Mac's will get this mode by default. [See FOOTNOTE] If you want to check that the data structure generated is ok, 3a) cc pdawg.c 3b) pdawg dict.dwg This should decompile the data and print out the original word list 4) cc dwgcheck.c Compile a Mickey Mouse spelling checker 5) dwgcheck a few wurdz on the comand line ? ... and test it. 6) cc tell.c Now try the 'advanced' stuff! - soundex correction... (I hates it :-() 7) tell me sume wrongli speld werds and test it... If you're getting into this stuff, here's another incidental hack you might want to try... 7a) cc proot.c 7b) proot root print out all words of form 'root*' One day I'll hook this stuff into regexp; but not today :-) ---------- Enough examples for now. If that all worked you are doing well! If it didn't, and you have the time, please find out what went wrong; If you can fix it, mail me the fix please, else mail me a log of the symptoms, such as compiler error messages. By the way, I know that the spelling correction uses a tacky algorithm; don't worry about it -- I'm working on some red hot stuff which will put soundex to shame (which isn't hard :) ) [Late note; it now works (apparently well) but I need a phonetic dictionary before it is useful :-( Anyone got a phonetic dictionary? Alvey people out there?] You might be interested in the data structures used; the main one is a dawg - a 'Directed Acyclic Word Graph' -- it's essentially a Directed Acyclic Graph implementing an FSM which describes the set of all words in your lexicon. The name DAWG comes from Appel's in his paper on Scrabble (although none of the code or algorithms do). The second most important data structure is superimposed on this one; it is a packing algorithm due to Liang (of TeX hyphenation fame) which allows you to convert an implicitly linked trie into an indexed trie *without* taking any more space. Liang is a real smart cookie & it is a shame this technique of his is not better known... (it should be up there among binary tree and linked lists!) OK, more examples: 8) cc pack.c compile the packing code 9) pack dict.dwg dict.pck this takes rather a long time; you'll get messages saying '1000 nodes' '2000 nodes' etc roughly 1 every 20 seconds +/- 50%. There shouldn't be more than a couple of screenfuls of these :-) I'll speed it up one day. If you want to check that this data structure generated is ok too, 9a) cc ppack.c 9b) ppack dict.pck Just as you did for (3) 10) cc pckcheck.c Just like dwgcheck but using the other type of data. 11) pckcheck sume moar possibly incorrectly speld woards boring, huh? 12) Tha Tha Thats all folks! --------------- Unless I've made some major cockup at the last minute, you should actually have a reasonable chance of getting these working on your machine, whatever it is. (Dec 20's and EBCDIC machines *possibly* excepted - attempts welcome!) As I said, please mail me your bugfixes, problems, suggestions etc. By the way, the standard of coding isn't exceptionally high; this implementation was always intended to be a prototype. A rewrite of dawg.c and pack.c is definitely high on the priority list... The actual application programs aren't too bad, but the interfaces to the various utility routines are liable to change on an hour-by-hour basis! (I've been trying to make them more consistent -- you should have seen the last lot ;-) ). However, the algorithms are all complete now; anyone who has ever needed to convert a tree to a dag might be interested in the linear-time solution in dawg.c - most other solutions I've seen have been NlogN or N^2. If you're interested, there's a file dictlib.h which is a proposed interface for the next iteration; comments are invited. Share & Enjoy, Graham Toal <gtoal@uk.ac.ed> <FOOTNOTE> I'm experimenting in this code with ways of finding out at compile time various parameters needed for portability. There's a file grope.h which does this. If things all work first time, great; if not, you may have to change various #define's. The best way to do that is to add code to grope.h which identifies your system, and only make changes of the form: #ifdef SYS_MYSYSTEM #undef something #define something (a different value) #endif /* my system */ There are instructions for customising in grope.h itself. My overall aim is to avoid special makefiles and special command line options. (A bit ambitious I know, but nice idea if it works...) (Steve Pemberton's config.c might be useful to you too if you are hacking grope.h. It was reposted on usenet recently.) </FOOTNOTE> <PS> Future hacks: 1) Anagrams 2) Crosswords 3) Scrabble 4) Anything where a text key can be used to lookup another piece of text. This is implemented with the 'dawg_locate_prefix' routine; it is effectively a cheap substitute for btrees etc. (and better than a hash table because you can *do* things with it!) 4a) Reverse phone book 1234567=Fred Bloggs 4b) house style enforcer stupid=infelicitous 4c) uk/us spelling advisor sidewalk=pavement 4d) shorthand/macro expander f.y.i.=for your information etc etc... (Most of wwb in fact) 5) Anything you can suggest! (or *implement* :-) ) </PS> <MEMO> Archive-name: dawgutils/part[1-3].shar Reply-to: Graham Toal <gtoal%uk.ac.ed@nsfnet-relay.ac.uk> 1) Upload fixed version of dyntrie.c - remember bug with signed chars being used as an index [comment rule in dawg.c?] 2) known design flaw: can't handle strings with 0 in them 3) known bug: dyntrie.c has problems if you enter a string which *starts* with 255. This is due to sending 'end_of_node' bit rather hackily. Can be fixed. 4) Test version to be posted to alt.sources by running on machine with signed chars (eg MSDOS) 5) Remove hacky Malloc debugging macros in utils.c -- there might be a problem if compiled on MSDOS but not under MICROSOFT C 6) Check that LIB_STRINGS is *not* defined when UX63 is set. 7) tweak the constants in pack.c to speed it up a lot 8) hack the pack.c heuristics into dyntrie.c to speed it up too. </MEMO> SHAR_EOF cat - << \SHAR_EOF > copyr.h /* * Copyright 1989 Graham Toal & Richard Hooker * * Permission to use, copy, modify, and distribute this software and its * documentation for any purpose with or without fee is hereby granted provided * that the above copyright notice appear in all copies and that both that * copyright notice and this permission notice appear in supporting * documentation. * * This program is publicly available, but is NOT in the public domain. The * difference is that copyrights granting rights for unrestricted use and * redistribution have been placed on the software to identify its * authors. You are allowed and encouraged to take this software and * build commercial products, freeware products, shareware products, and * any other kind of product you can think up, with the following restriction: * * If you redistribute the source to this program, or a derivitive of that * source, you must include this copyright notice intact. If the system * this source is distributed with is under a stricter license (such as * a commercial license, the typical freeware "no commercial use" license, * or the FSF copyleft) then you must provide the original source under the * original terms. * * Edinburgh Software Products (E.S.P.) makes no representations about the * suitability of this software for any purpose. It is provided "as is" * without express or implied warranty. * * E.S.P. DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING * ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL * E.S.P. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR * ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER * IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT * OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * Authors: Graham Toal & Richard Hooker, Edinburgh Software Products, * 6 Pypers Wynd, Prestonpans, East Lothian, Scotland. */ SHAR_EOF cat - << \SHAR_EOF > dawg.h /** * * DAWG.H * * Header file for Directed Acyclic Word Graph access * * The format of a DAWG node (8-bit arbitrary data) is: * * 31 24 23 22 21 0 * +--------------------+---+---+--+-------------------------------------+ * | Letter | W | N |??| Node pointer | * +--------------------+---+---+--+-------------------------------------+ * * where N flags the last edge in a node and W flags an edge as the * end of a word. 21 bits are used to store the node pointer, so the * dawg can contain up to 262143 edges. (and ?? is reserved - all code * generating dawgs should set this bit to 0 for now) * * The root node of the dawg is at address 1 (because node 0 is reserved * for the node with no edges). * * **** PACKED tries do other things, still to be documented! * **/ #define TRUE (0==0) #define FALSE (0!=0) #define DAWG_MAGIC 0x01234567 #define PACK_MAGIC 0x34567890 #define TERMINAL_NODE 0 #define ROOT_NODE 1 #define V_END_OF_WORD 23 #define M_END_OF_WORD (1L << V_END_OF_WORD) #define V_END_OF_NODE 22 /* Bit number of N */ #define M_END_OF_NODE (1L << V_END_OF_NODE) /* Can be tested for by <0 */ #define V_FREE 22 /* Overloading this bit, as packed tries need no end-of-node */ #define M_FREE (1L << V_FREE) #define V_LETTER 24 #define M_LETTER 0xFF #define M_NODE_POINTER 0x1FFFFFL /* Bit mask for node pointer */ /* max_chars and base_chars are a hangover from the days when this was trying to save space, and dawgs were only able to contain characters in the range 'a'..'z' 'A'..'Z' (squeezed into 6 bits). Now that we're '8-bit clean' we no longer need these. Later code in fact doesn't support the old style; but some procedures still subtract 'BASE_CHAR' from the character to normalize it. Since it is now 0 it is a no-op... */ #define MAX_CHARS 256 #define BASE_CHAR 0 /* Enough? */ #define MAX_WORD_LEN 256 #define MAX_LINE 256 typedef long NODE; typedef long INDEX; SHAR_EOF cat - << \SHAR_EOF > dictlib.h /* This header file does not describe any of the code in the accompanying files; it is a rough sketch of the NEXT iteration of the spelling/word utilities. Comments are invited. Missing functionality or bits that won't work should be pointed out please! Send mail to gtoal@ed.ac.uk (via nsfnet-relay.ac.uk from US sites) many thanks. Graham Toal 23/06/90 */ typedef long INDEX; typedef long NODE; #define ROOTNODE 1L typedef struct DICT { /* Although primitive procedures will be provided for handling the basic dawg array, by using this interface applications can remain independent of the implementation details of the dictionary. */ /* So far there are three forms of dict; 1: a simple dawg with edges as 4-byte integers; each node (a set of branches) is stored sequentially. 2: An *indexed* form of the above -- much faster to lookup in, but slower to walk through. Although indexing would normally be heavy on space, this uses Liang's packed overlapping tries, thus using almost exactly the same space as method 1. 3: A form of 1, but with all the stops pulled out to get as much bit-level compaction as possible; edges can be two bytes instead of 4. (by Keith Bostic) */ /* This struct contains both access procedures and information; most fields will be filled in by our code when the dictionary is loaded by load_dict(). The fields may be added to in later releases, but they will always be upwards compatible; none of this data is saved to disk in this format, so changing the struct layout will not cause problems as long as the names remain the same. */ /* Note that some procedures take a user-supplied 'apply' parameter; This 'apply' procedure has a 'context' parameter which is merely a handle to allow the user to pass data in to his code without having to use global variables. If the user's apply procedure returns anything other than 0, it will cause an early exit, with the users return code being returned from the calling function. */ /* It is intended that these are used in an object-oriented way; although we will supply an external dict_check(word, dict) procedure, for instance, all it will do is call dict->check(word, dict->dawg) */ NODE *dawg; /* Root of the dictionary tree */ char *dict_name, /* "smiths" - used in command lines etc. */ *dict_info, /* "Johns Smiths Rhyming Dictionary,\n (c) Smith & Jones, 1892\n Whitechapel, London.\n" */ *dict_fname, /* "/usr/dict/smiths" */ *lang_charset; /* "ISO 8859/1 Latin 1" */ /* These are filled in by us to make life easier for the applications programmer. We'll supply a default table for simple 7-bit ascii dawgs; but this mechanism allows us the option of including all the salient points about the character set used in a dictionary in the dictionary itself. Because they are char*'s, we'll probably end up sharing the same table between multiple dictionaries -- so don't write to them. (Although you may replace the _pointer_ with one to a table of your own. */ /* Later note: I've just found out about LOCALE.H etc. This stuff may therefore change... */ int *chartype; /* 1 whitespace */ /* 2 punctuation */ /* 4 blank */ /* 8 lower case letter */ /* 16 upper case letter */ /* 32 (decimal) digit */ /* 64 underscore */ /* 128 A-F and a-f */ /* 256 Ignore words starting with this char */ /* 512 Include in definition of a word -- mainly alpha but also hyphen and apostrophe */ char *lower, /* personal implementation of tolower() */ *unaccented, /* map accented chars to non-accenmted */ *lexorder; /* an arbitrary ordering for sorting: e-acute would come after e and before f */ int (* check) (NODE *base, INDEX state, struct DICT *d, char *word); /* Check a word - exact match; return true/false */ int (* fuzzy) (NODE *base, INDEX state, struct DICT *d, char *word, char *value); /* Check a word - fuzzy case, accents, hyphens etc; return true/false */ /* value is filled in with the dictionary's idea of what the word should look like. */ /* Fancier correction procedures may be synthesised out of user code and 'walk' */ int (* correct) ( char *word, NODE *base, INDEX state, /* Normally dict's root, but may be subtree */ struct DICT *d, /* lowercase/accent information from here */ int maxwords, int order, /* 0: by alpha 1: by highest score first 2: by lexical ordering (ignores case, accents, hyphens) */ int (* apply) ( char *word, /* Corrected word */ int score, /* Normalised to range 0..255 */ void *context), void *context); /* Offer spelling corrections for the given word. If maxwords > 0, return the best <maxwords> words in order of most-likely first; otherwise many words may be returned, in alphabetical order, coupled with a score: 255 means exact match; 0 means totally different */ /* returns 0 if user returned 0 for all applications */ int (* walk) ( NODE *base, INDEX state, int (* apply) (char *word, void *context), void *context); /* Apply the procedure given to every word in the dictionary tree or subtree, passing in the user-supplied context */ /* returns 0 if user returned 0 for all applications */ int (* lookup) ( NODE *base, INDEX state, char *key, /* Out */ INDEX values); /* Return the index of the subtree of the dictionary which has 'key' as a prefix; for instance, in a reverse telephone book, you might find entries: 0874=Anytown 0875=Cockenzie 0875=Port Seton 0875=Prestonpans 0876=Toytown In this case, if searching for "0875=", a subtree would be returned containing: Cockenzie Port Seton Prestonpans This subtree would then be walked using the 'walk' function and a user- written 'apply' procedure. */ /* returns 0 if successful */ /* The following two are performance hints ONLY. They should not affect the correct functioning of a program. */ int (* uncache) (struct DICT *d);/* The space a dictionary uses may be reclaimed by calling this. If the dictionary is subsequently accessed, there *may* be a performance penalty -- for instance the dictionary may be accessed off disk. However on a machine with sufficient memory, the system may choose to leave the data in ram until, say, malloc runs out of space. If there is no convenient mechanism for organising the demand-recacheing, this may be a no-op. */ /* returns 0 if successful */ int (* recache) (struct DICT *d);/* The data of the dictionary is reloaded into active memory where it will stay */ /* returns 0 if successful */ /* May be extended in the future */ } DICT; /* dict_load This is provided as a primitive so that dictionaries can be loaded in the most efficient way the machine supports. The space for the dict is supplied *by* this procedure - not *to* it. This allows a Mach (or Emas) implementation to mmap (or Connect) the file in memory - the connection address being chosen by the OS and outside our control. It also allows systems like RISC OS on the Acorn Archimedes to use an optimised whole- file load call to pull the file off disk into real ram in a single disk operation. Since the data parts of the type 1 & 2 dicts are just a set of 4-byte integers, it is easy to correct a faulty byte-sex by running down this array *once at start-up* to reverse the order. This is easy if the data is loaded into physical ram. If it is connected as a file by a VM system however, the VM system must support copy-on-write; if it only has read-only mapping there must be two files available -- one of each sex. A *possible* scheme on VM systems is to byte-sex-reverse a whole page the first time that page is touched. This may be a lot of unneccesary work - I'd recommend keeping a correct-sex file on-line. This code will create the DICT struct, and fill in the various fields, either from the dictionary file itself or from private knowledge if not available. */ int dict_load_file(char *fname, DICT **dict_struct); /* fname is the literal file name */ /* dict_struct is allocated by us, not by the user. */ /* returns 0 if successfully loaded */ int dict_load_dict(char *dname, DICT **dict_struct); /* dname is the dictionary name, e.g. "words", or "web2" etc. The corresponding file will be searched for via a system-dependent path, logical variable, or fixed place. */ /* dict_struct is allocated by us, not by the user. */ /* returns 0 if successfully loaded */ int dict_unload(DICT *d); /* Use when totally finished with data */ /* These are user-level veneers for the internal procedures used above. Use them in preference to the above. Use the above only if you are a speed-freak! (but disguise them in macros such as: #define dict_check(dawg, dict, word) dict->check(dict->dawg, dict, word) These macros will of course go 'Bang!' if dict ptr is unassigned or NULL. */ /* In the worst case, if a machine has a poor quality compiler which doesn't support procedure variables well, these routines could be the *acual* implementation rather than just a pointer to it. */ int check (NODE *base, INDEX state, DICT *d, char *word); int fuzzy (NODE *base, INDEX state, DICT *d, char *word, char *value); int correct (char *word, NODE *base, INDEX state, /* Normally dictionary root, but may be subtree */ DICT *d, /* lowercase/accent information from here */ int maxwords, int order, /* 0: by alpha 1: by highest score first 2: by lexical ordering */ int apply(char *word, /* Corrected word */ int score, /* Normalised to range 0..255 */ void *context), void *context); int walk (NODE *base, INDEX state, /* Yoyo fanatics will be pleased to learn that they can 'walk the dawg'... :-) */ int apply(char *word, void *context), void *context); int lookup (NODE *base, INDEX state, char *key, /* Out */ INDEX values); /* Now here come the fast internal procedures which the above eventually call */ int dawg_check(NODE *base, INDEX state, char *word); /* Standard dawg */ int pack_check(NODE *base, INDEX state, char *word); /* Overlapped indexed dawg */ int bsd_check(NODE *base, INDEX state, char *word); /* Space-optimised dawg (called bsd because Keith Bostic is working on this format for the next bsd4.4 spelling checker) */ /* Must make the internl procs agree with these... save the macros for later... #define dict_check(word,dict) \ (dict != NULL ? (dict->check(dict->dawg, ROOTNODE, word)) : -1) */ int dawg_walk(NODE *base, INDEX state, int apply(char *word, void *context), void *context); int pack_walk(NODE *base, INDEX state, int apply(char *word, void *context), void *context); int bsd_walk(NODE *base, INDEX state, int apply(char *word, void *context), void *context); /* #define dict_walk(dict, apply, context) \ (dict != NULL ? (dict->walk(dict->dawg, ROOTNODE, apply, context)) : -1) */ int dawg_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values); int pack_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values); int bsd_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values); /* etc... */ /* Get list of possible next characters from this point in the tree, put it into user-supplied array branch[256]. Filled with NULL if no branch, or an INDEX if it points to more of the tree. Words which may terminate on a particular letter have terminate[let] != 0 The user-supplied terminate array is deliberately a long array to allow for expansion; it may be used one day to return arbitrary codes such as 'Noun' etc... */ void dawg_branch(NODE *base, INDEX state, NODE *branch, long *terminate); void pack_branch(NODE *base, INDEX state, NODE *branch, long *terminate); void bsd_branch(NODE *base, INDEX state, NODE *branch, long *terminate); /* No internal proc yet... add it in the morning. */ /* In fact there will be even simpler procedures which can be used by people who prefer to include the whole source of this package rather than just the library interface parts. In applications where utmost speed is a neccessity (eg Scrabble(tm)) you could use these, or write your own. But most of us can use this clean interface with unnoticable overhead. */ SHAR_EOF cat - << \SHAR_EOF > grope.h #ifndef grope_h #define grope_h 1 #ifdef TESTING_DEFS /*###################################################################*/ /* * This is an exercise to assist in writing portable programs. * * To use as a header file, eg "grope.h", just leave this file * as it is. To use it to define new entries, rename it as * "grope.c" and compile it as "cc -DTESTING_DEFS grope". * * To customise this test program for your system, I have set it up * so that all features are enabled. If you find that one feature * causes a compile-time error, test a suitable set of '#ifdef's for * your machine and #undef the values below which are not appropriate. * * Please do your best to see that your tests are unique, and that * you do not undo any tests already implemented. * * One last request; PLEASE PLEASE PLEASE send me any updates you * may make. If you cannot get through to me on email, send me a * paper copy (or even better, a floppy copy :-)) to Grahan Toal, * c/o 6 Pypers wynd, Prestonpans, East Lothian, Scotland EH32 9AH. * * Graham Toal <gtoal@uk.ac.ed> * [PS: I can read DOS and RISC OS floppies only] */ #define STDLIBS 1 #define PROTOTYPES 1 #define KNOWS_VOID 1 #define KNOWS_LINE 1 #define KNOWS_REGISTER 1 #endif /* TESTING_DEFS */ /*###################################################################*/ #define SYS_ANY 1 #define COMPILER_ANY 1 /* Don't yet know what machine it is. Add a test once identified. */ /*--------------------------*/ #ifdef MSDOS #define SYS_MSDOS 1 #endif /*--------------------------*/ #ifdef __STDC__ #define STDLIBS 1 #define PROTOTYPES 1 #define KNOWS_VOID 1 /* If the compiler defines STDC it should have these. We can undef them later if the compiler was telling lies! */ #endif /*--------------------------*/ #ifdef MPU8086 #define SYS_MSDOS 1 /* Aztec does not #define MSDOS. We assume it is Aztec on MSDOS from the MPU8086. */ #ifdef __STDC__ #define COMPILER_AZTEC 1 /* Aztec is known to also define __STDC__ (illegally). Therefore if __STDC__ is not defined, it is probably not Aztec */ #endif #endif #ifdef SYS_MSDOS /*--------------------------*/ #ifdef __STDC__ /*----------------------*/ #define COMPILER_MICROSOFT 1 /* I assume that the combination of #define MSDOS and #define __STDC__ without any other collaboration means MICROSOFT. (Who, incidentally, are being naughty by declaring __STDC__) */ #define KNOWS_LINE 1 #else /*----------------------*/ #ifdef __ZTC__ /*------------------*/ #define COMPILER_ZORTECH 1 #undef STDLIBS /* Another system without locale.h */ #define PROTOTYPES 1 #else /*------------------*/ /* A non-std msdos compiler */ #undef STDLIBS #undef PROTOTYPES /*------------------*/ #endif /*----------------------*/ #endif /*--------------------------*/ #endif #ifdef TURBOC /* Although Turbo C correctly does not define __STDC__, it has SOME standard libraries (but not all - locale.h?) and accepts prototypes. */ #undef STDLIBS #define PROTOTYPES 1 #define SYS_MSDOS 1 #define COMPILER_TURBOC 1 /* TURBOC is defined, but has no value. This allows it to be tested with #if as well as #ifdef. */ #endif /*--------------------------*/ #ifdef COMPILER_MICROSOFT #undef STDLIBS /* Again, like Turbo C, does not know locale.h */ #define PROTOTYPES 1 #endif /*--------------------------*/ #ifdef SYS_MSDOS #define MEMMODELS 1 /* We assume ALL MSDOS machines use memory models */ #endif /*--------------------------*/ #ifdef UX63 #undef STDLIBS #undef PROTOTYPES #define SYS_UX63 1 #define COMPILER_PCC 1 /*#define LIB_STRINGS 1 - apparently not... */ #endif /*--------------------------*/ #ifdef sun #undef STDLIBS #undef PROTOTYPES #define SYS_SUN 1 #define COMPILER_PCC 1 #endif /*--------------------------*/ #ifdef THINK_C #define SYS_MAC 1 #define COMPILER_THINKC 1 #define KNOWS_VOID 1 #endif /*--------------------------*/ #ifdef sparc #undef STDLIBS #undef PROTOTYPES #define SYS_SPARC 1 #define COMPILER_PCC 1 #endif /*--------------------------*/ #ifdef ARM #define SYS_RISCOS 1 /* I fear Acorn may define 'ARM' on both unix and risc os versions. Should find out whether they define others as well, to differentiate. */ #endif #ifdef __ARM__ #define SYS_RISCOS 1 /* I fear Acorn may define 'ARM' on both unix and risc os versions. Should find out whether they define others as well, to differentiate. */ #endif /*--------------------------*/ #ifdef SYS_RISCOS #define COMPILER_NORCROFT 1 #define KNOWS_REGISTER 1 #define KNOWS_LINE 1 #endif /*--------------------------*/ #ifdef vms #define SYS_VMS 1 #endif /*--------------------------*/ /*--------------------------*/ #ifdef SYS_UX63 #undef SYS_ANY #endif #ifdef SYS_ARM #undef SYS_ANY #endif #ifdef SYS_MSDOS #undef SYS_ANY #endif #ifdef SYS_SUN #undef SYS_ANY #endif #ifdef SYS_SPARC #undef SYS_ANY #endif #ifdef SYS_RISCOS #undef SYS_ANY #endif #ifdef SYS_MAC #undef SYS_ANY #endif #ifdef SYS_VMS #undef SYS_ANY #endif /*--------------------------*/ #ifdef COMPILER_PCC #undef COMPILER_ANY #endif #ifdef COMPILER_MICROSOFT #undef COMPILER_ANY #endif #ifdef COMPILER_TURBOC #undef COMPILER_ANY #endif #ifdef COMPILER_ZORTECH #undef COMPILER_ANY #endif #ifdef COMPILER_AZTEC #undef COMPILER_ANY #endif #ifdef COMPILER_NORCROFT #undef COMPILER_ANY #endif #ifdef COMPILER_THINKC #undef COMPILER_ANY #endif /*--------------------------*/ /* ##################################################################### */ #ifdef TESTING_DEFS #include <stdio.h> /* ======================================================================= */ #ifdef STDLIBS /* If any of these fail, make sure STDLIBS is not defined for your machine. */ #include <stdlib.h> /* STDLIBS should not be defined. */ #include <time.h> /* STDLIBS should not be defined. */ #include <locale.h> /* STDLIBS should not be defined. */ #endif /* ======================================================================= */ #ifdef KNOWS_VOID void test() { /* KNOWS_VOID should not be defined */ /* Make sure your ifdef's don't #define KNOWS_VOID if this fails */ } #endif /* ======================================================================= */ #ifdef KNOWS_REGISTER int regtest() { register int i = 0; /* KNOWS_REGISTER should not be defined */ /* Make sure your ifdef's don't #define KNOWS_REGISTER if this fails */ return(i); } #endif /* ======================================================================= */ #ifdef PROTOTYPES int main(int argc, char **argv) /* PROTOTYPES should not be defined */ /* If this fails, make sure PROTOTYPES is not defined for your machine. */ #else int main(argc,argv) int argc; char **argv; #endif { /*-------------------------------------------------------------------------*/ printf("We should know just what the machine is, or 'SYS_ANY':\n"); #ifdef SYS_UX63 printf("#define SYS_UX63 %d\n", SYS_UX63); #endif #ifdef SYS_ARM printf("#define SYS_ARM %d\n", SYS_ARM); #endif #ifdef SYS_MSDOS printf("#define SYS_MSDOS %d\n", SYS_MSDOS); #endif #ifdef SYS_SUN printf("#define SYS_SUN %d\n", SYS_SUN); #endif #ifdef SYS_SPARC printf("#define SYS_SPARC %d\n", SYS_SPARC); #endif #ifdef SYS_RISCOS printf("#define SYS_RISCOS %d\n", SYS_RISCOS); #endif #ifdef SYS_MAC printf("#define SYS_MAC %d\n", SYS_MAC); #endif #ifdef SYS_VMS printf("#define SYS_VMS %d\n", SYS_VMS); #endif #ifdef SYS_ANY printf("#define SYS_ANY %d\n", SYS_ANY); #endif /*-------------------------------------------------------------------------*/ printf("Either the machine has different memory models or not:\n"); #ifdef MEMMODELS printf("#define MEMMODELS %d\n", MEMMODELS); #else printf("#undef MEMMODELS\n"); #endif /*-------------------------------------------------------------------------*/ printf("One compiler name should be given, or 'COMPILER_ANY':\n"); #ifdef COMPILER_PCC printf("#define COMPILER_PCC %d\n", COMPILER_PCC); #endif #ifdef COMPILER_MICROSOFT printf("#define COMPILER_MICROSOFT %d\n", COMPILER_MICROSOFT); #endif #ifdef COMPILER_TURBOC printf("#define COMPILER_TURBOC %d\n", COMPILER_TURBOC); #endif #ifdef COMPILER_ZORTECH printf("#define COMPILER_ZORTECH %d\n", COMPILER_ZORTECH); #endif #ifdef COMPILER_AZTEC printf("#define COMPILER_AZTEC %d\n", COMPILER_AZTEC); /* Can exist on other machines as well as MSDOS */ #endif #ifdef COMPILER_LATTICE /* Currently coming through as 'COMPILER_ANY' - haven't found one to test */ /* Can exist on other machines as well as MSDOS. Meanwhile pass it in */ /* on the command_line? */ printf("#define COMPILER_LATTICE %d\n", COMPILER_LATTICE); #endif #ifdef COMPILER_GCC /* Ditto. Test in appropriate places for each machine. */ printf("#define COMPILER_GCC %d\n", COMPILER_GCC); #endif #ifdef COMPILER_NORCROFT printf("#define COMPILER_NORCROFT %d\n", COMPILER_NORCROFT); #endif #ifdef COMPILER_THINKC printf("#define COMPILER_THINKC %d\n", COMPILER_THINKC); #endif #ifdef COMPILER_ANY printf("#define COMPILER_ANY %d\n", COMPILER_ANY); #endif /*-------------------------------------------------------------------------*/ printf("Either the compiler accepts ANSI-like libraries or not:\n"); #ifdef STDLIBS printf("#define STDLIBS %d\n",STDLIBS); #else printf("#undef STDLIBS\n"); #endif /*-------------------------------------------------------------------------*/ printf("Either the machine accepts ANSI prototypes or not:\n"); #ifdef PROTOTYPES printf("#define PROTOTYPES %d\n", PROTOTYPES); #else printf("#undef PROTOTYPES\n"); #endif /*-------------------------------------------------------------------------*/ printf("Either the machine needs nonstandard #include <strings.h> or not:\n"); #ifdef LIB_STRINGS printf("#define LIB_STRINGS %d\n", LIB_STRINGS); #else printf("#undef LIB_STRINGS\n"); #endif /*-------------------------------------------------------------------------*/ printf("Either the machine accepts the 'void' keyword or not:\n"); #ifdef KNOWS_VOID printf("#define KNOWS_VOID %d\n", KNOWS_VOID); #else printf("#undef KNOWS_VOID\n"); #endif printf("Either the machine accepts the 'register' keyword or not:\n"); /*-------------------------------------------------------------------------*/ printf("Either the compiler accepts the '__LINE__' directive or not:\n"); #ifdef KNOWS_LINE printf("#define KNOWS_LINE %d\n", KNOWS_LINE); #else printf("#undef KNOWS_LINE\n"); #endif /*-------------------------------------------------------------------------*/ printf("Either the machine accepts the 'register' keyword or not:\n"); #ifdef KNOWS_REGISTER printf("#define KNOWS_REGISTER %d\n", KNOWS_REGISTER); #else printf("#undef KNOWS_REGISTER\n"); #endif /* Note - this is intended to be used only if your compiler accepts 'register' *AND* compiles code with it correctly! Some compilers show up bugs when register optimisations are attempted! */ /*-------------------------------------------------------------------------*/ if (argv[argc] != 0) printf("Warning! argv[argc] != NULL !!! (Non ansii)\n"); } #endif /* TESTING_DEFS */ #endif /* Already seen? */ SHAR_EOF