home *** CD-ROM | disk | FTP | other *** search
- W. Vann Hall
- Pala Designs
- P.O. Box 10804
- Arlington, VA 22210
- CIS 70346,1144
- MCI WVHALL
-
- ABOUT Boyer-MOORE:
-
- Boyer-Moore is an attempt to speed up the usually serial text search.
- Traditionally, text searches proceed from the first character onward.
- (In these examples, "string" refers to string we're trying to locate,
- "text" the array of characters we're trying to find the string in.
- These all ignore case sensitivity, matching around line boundaries,
- punctuation, etc.)
-
- String to find: STRING
- Text to find it in: HOW YOU SEARCH FOR A STRING
-
- Our StringPointer and TextPointer both start at 1. We compare
- String[1] and Text[1]. Since "S" and "H" don't match, we increment the
- text pointer and compare again. At TextPointer = 9, the "S" of
- "STRING" and the "S" of "SEARCH" match, so we increment TextPointer
- *and* StringPointer and compare String[2] and Text[10]. "T" and "E"
- don't match, so we reset StringPointer and increment TextPointer again.
- And so on. You can see that it takes 22 comparisons before we locate
- the correct "S" and a full 27 before we know that we have located
- "STRING."
-
- Boyer-Moore, on the other hand, works from the end of the string (but
- still the beginning of the text). First, it creates a look-up table of
- values corresponding to the distance of the first occurence of each
- character from the end of the string. For instance, the table for
- "STRING" would read:
-
- A=6 B=6 C=6 D=6 E=6 F=6 G=6 H=6 I=2 J=6 K=6 L=6 M=6
- N=1 O=6 P=6 Q=6 R=3 S=5 T=4 U=6 V=6 W=6 X=6 Y=6 Z=6
-
- Note that characters not located in the string are set to
- Length(String), as it the final character.
-
- Since a 6-character string can't be located entirely within the first 5
- characters, we start by comparing the String[6] -- "G" -- with Text[6]:
- "O". They don't match, so we increment TextPointer by the value
- associated with "O" in the table; that is, by 6. Our next compare is
- "G" with the "R" in "SEARCH". We now increment TextPointer by "R"'s
- value, 3, and compare "S" to " ". And so on. Here's a tracing of the
- progression; the letter to be compared is highlighted by lower casing:
-
- STRINg
- HOW YoU SEARCH FOR A STRING
-
- STRINg
- HOW YOU SEArCH FOR A STRING
-
- STRINg
- HOW YOU SEARCH^FOR A STRING
-
- STRINg
- HOW YOU SEARCH FOR A STRING
-
- STRINg
- HOW YOU SEARCH FOR A STRINg
-
- STRIng
- HOW YOU SEARCH FOR A STRIng
-
- STRing
- HOW YOU SEARCH FOR A STRing
-
- STring
- HOW YOU SEARCH FOR A STring
-
- String
- HOW YOU SEARCH FOR A String
-
- string
- HOW YOU SEARCH FOR A string
-
- There; 10 comparisons total. You can see how this would speed
- searching a long text -- enough to make up for the additional overhead
- of compiling the look-up table.