home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!munnari.oz.au!bunyip.cc.uq.oz.au!uqcspe!cs.uq.oz.au!warwick
- From: warwick@cs.uq.oz.au (Warwick Allison)
- Newsgroups: rec.puzzles
- Subject: Re: Computer Dictionary Searches
- Message-ID: <11811@uqcspe.cs.uq.oz.au>
- Date: 22 Jan 93 01:12:09 GMT
- References: <1993Jan20.185756.16017@njitgw.njit.edu>
- Sender: news@cs.uq.oz.au
- Reply-To: warwick@cs.uq.oz.au
- Lines: 88
-
- jpk2581@hertz.njit.edu (joseph p kisenwether apmt stnt) writes:
-
- >I am trying to create some original word puzzles and need to find
- >words with certain characteristsics:
- >For example
- > two words of more than ten letters which differ by only one
- >stictly internal letter such that there exist two other letter
- >(one to the left and one to the right) such that in word one
- >there exists no sub-word containing both the changable letter
- >and the one to the left while in word two there exists no sub-word
- >conatining both the changable letter and the one to the right.
-
- >Needless to say, this could most easily be done by computer search,
- >and as I have seen people here refer to them before, I'd like to know;
-
- >a) what dictionary I should use (is there a standard)
-
- Standard list of English words? Blasphemy!
-
- >b) where to get a copy of it
-
- arthur.cs.purdue.edu in /pub/pcert/dict - ALL LANGUAGES!
-
- >c) what the format is so that I can right the appropriate search
- > program
-
- A Trie. You could use a DAWG, but we (Paul Dale & I) have found that
- a Trie is better, because you can store suplementary data for arcs
- in the tree.
-
- A Trie is a word-tree where each arc represents the addition of a letter
- to the ancestral path of letters. The arc also stores a bit that defines
- whether the addition of that arc makes a valid word. eg, below is a simple
- Trie that stores the words "AT", "AXE", "BE", and "BI". The upper-case
- letters on the arcs are those with the "is a word" bit set.
-
- (root node)
- |
- +-a-> (node)
- | |
- | +-T-> (node)
- | |
- | +-x-> (node)
- | |
- | +-E-> (node)
- |
- +-b-> (node)
- |
- +-E-> (node)
- |
- +-I-> (node)
-
- >d) anything else that the more experienced think is relevant
-
- Storing the Trie efficiently is very important, since there are a LOT
- of words. By packing the information into a single 32-bit value for
- each arc, you can get great efficiency. For more restricted word-
- processing algorithms (in the true sense of word-processing), a DAWG
- acan be used, which optimizes the tree into a graph (so, for example,
- many of the "i -> n -> g" arc sequences in the Trie appear as a single
- "i -> n -> g" arc sequence in a DAWG.
-
-
- Reference:
-
- -----------------------------------------------------------------------------
- Journal: Communications of the ACM May 1988 v31 n5 p572(8)
- * Full Text COPYRIGHT Assn. for Computing Machinery, Inc. 1988.
- -----------------------------------------------------------------------------
- Title: The world's fastest Scrabble program. (technical)
- Author: Appel, Andrew W.; Jacobson, Guy J.
-
- Summary: An efficient backtracking algorithm is used to make a very fast
- program that plays the Scrabble crossword game. Creating data
- structures before the backtracking search begins makes the
- algorithm efficient by focusing the search and making each step of
- the search fast.
- -----------------------------------------------------------------------------
-
- (This can be found using WAIS, in the cacm.src, using keyword "scrabble")
-
-
- --
- Warwick
- _-_|\ warwick@cs.uq.oz.au /____ Man's agar ____ /
- / * <-- Computer Science Department, / alerting altering /
- \_.-._/ University of Queensland, / integral relating /
- v Brisbane, Australia. / tanglier triangle /
-