home *** CD-ROM | disk | FTP | other *** search
- 3+ies -> y
- 3+ing ->
- 4+ness ->
- ss -> ss
- 3+s ->
- 4+ion ->
- 4+ism ->
- 4+ly ->
- 3+eed -> ee
- 4+ied -> y
- 4+ed ->
- 4+er ->
- 4+ful ->
- 4+able ->
- 4+ible ->
- 3+v -> f
- 4+e ->
- 3+dd -> d
- 3+gg -> g
- 3+ll -> l
- 3+mm -> m
- 3+nn -> n
- 3+pp -> p
- 3+rr -> r
- 3+ss -> s
- 3+tt -> t
- ------------------------------------------------------------------
- Customized Stemming
- ===================
-
- Stemming rules vary from one language to another. dtSearch
- includes a set of stemming rules designed to work with English.
- These rules are in the file STEMMING.DAT. If you need to
- implement stemming for a different language, or you want to
- modify the English stemming rules, you can create a new set of
- stemming rules to be used in place of STEMMING.DAT.
-
- Stemming rules consist of a series of lines like this:
-
- 3+ies -> Y
- 4+ing ->
-
- The first rule would convert any word with three or more letters
- followed by ies to the same initial letters followed by y.
- "Applies" would turn into "apply".
-
- The second rule would remove the "ing" from any word with four or
- more letters followed by "ing". "Fishing" would turn into "fish", but
- "sing" would not change.
-
- In general, a rule consists of: a minimum number of letters (not
- including the suffix), a + sign, a suffix to be removed, an arrow
- (->) and the replacement for the suffix, if any. Stemming rules
- must use lower-case letters only. Up to 100 stemming rules can be
- included in a stemming.dat file.
-
- When stemming a word, dtSearch will look at each rule in order
- until it finds one that applies. If it finds a rule, dtSearch
- will apply the rule and then start over, repeating the process
- until the word does not change. The result is the "stem" of the
- original word.
-
- Sometimes you may want to create a rule with an exception. For
- example, suppose you want to remove a trailing "s" in a word,
- unless the word ends in "ss". To do this, you would use these two
- rules:
-
- 3+ss -> ss
- 3+s ->
-
- If a word ends in "ss", dtSearch will never get past the first rule
- and will give up stemming the word because the rule "3+ss -> ss"
- does not change the word. Only words not ending in "ss" will get
- to the next rule, which removes the trailing "s".
-
- Setting up stemming rules can be somewhat tricky. To help,
- dtSearch includes the STEMTEST utility. STEMTEST will allow you
- to try out your stemming rules, entering words and seeing what
- the resulting stem words are.
-
-