home *** CD-ROM | disk | FTP | other *** search
- {$G512,P512,D-} { Compiler directives to allow I/O redirection }
- program SrchMem;
- {
-
- Compare variations of 2 string search algorithms, the brute force and
- Boyer-Moore methods.
-
-
-
- Implemented by Paul Wolberg. Copyright (C) 1986.
-
- References:
-
- "A Fast String Searching Algorithm" by Boyer and Moore,
- Communications of the ACM, October 1977, volume 20, number 10,
- pages 762-772.
-
- "Algorithms" by Robert Sedgewick, published by Addison-Wesley,
- 1984, pages 241-252.
-
- "Fast Pattern Matching In Strings" by Knuth, Morris and Pratt.
- SIAM Journal of Computing, Vol 6, No. 2, June 1977
-
- "A Correct Preprocessing Algorithm for Boyer-Moore String Searching"
- by Wojciech Rytter
- SIAM Journal of Computing, Vol 9, No. 3, August 1980
-
-
-
- The following notation is used:
-
- Target is the character string which is to be searched for the pattern
-
- Pattern is the pattern being searched for in Target
-
-
-
- The algorithms are:
-
- 0 Brute Force
- 1 Boyer-Moore using just the delta1 table
- 2 Boyer-Moore using the fast/slow loop form of delta1
- 3 Boyer-Moore using both the delta1 and delta2 tables
- 4 B-M with delta1, counting total skips
- 5 B-M with delta1 and delta2, counting total skips
- 6 Create delta2 table using brute force method.
- 7 Create delta2 table using Rytter's correction to Knuth.
-
-
-
- Program structure:
-
- The GetProb routine gets the algorithm to use, the file to search and
- the pattern to search for from the command line.
-
- The Setup routine opens the file to be searched, reads it into an
- array of strings in memory
- and initializes the elapsed timer to the current time.
-
- The selected algorithm is used to search the text for "TotalTimes" times,
- in order to have sufficiently long bench marks for comparisons.
-
- Each algorithm has 2 routines, a DO routine and a search routine. The
- Do routine calls the search routine once for each line in the file.
- It increments a counter if the search routine found the pattern in the line.
- The Boyer-Moore algorithms also call one or more routines to set up tables
- used by that set of algorithms.
-
- After the text has been searched the Cleanup routine computes and displays
- the elapsed time. Turbo Pascal compiler directives are used that allow
- redirection of the results to a disk file.
-
- Algorithms 0 to 3 are used to compare the brute force method and variations
- of the Boyer-Moore method by comparing timings for the methods.
-
- Algorithms 4 and 5 are used to compare Boyer-Moore with and without the
- delta 2 table. These comparisons are based on counts of the number of
- times the different tables are used and the aggregate pattern shifts
- for the 2 methods. The timings are not relevant for these 2 comparisons,
- since they include time used to increment the counters.
-
- Algorithms 6 and 7 are used to compare 2 methods of constructing the delta 2
- table. Constructing the delta 2 table involves searching the pattern for
- substrings of itself, hence it is a string search problem as well. Algorithm
- 6 uses a brute force method, whereas algorithm 7 uses a method similar to
- the delta 2 part of Boyer-Moore. See Knuth and Rytter for details on this.
- There are no DO or search routines for algorithms 6 and 7, the delta 2
- construction routines are called directly. The comparisons are based
- on timings.
-
- There is also a routine to display the pattern and its delta 2 table.
- It is used for debugging only. Calls to it are deleted for comparison
- runs.
-
- Usage: SRCHMEMP algorithm file pattern
- algorithm = 0, brute force
-
- algorithm = 1, Boyer-Moore with just delta1
- algorithm = 2, Boyer-Moore with fast/slow loop form of delta1
- algorithm = 3, Boyer-Moore with both delta1 and delta2 tables
- algorithm = 4, Boyer-Moore with delta1, counting total skips
- algorithm = 5, Boyer-Moore with delta1 and delta2, counting total skips
- algorithm = 6, Create delta2 table using brute force method
- algorithm = 7, Create delta2 table using Rytter's correction to Knuth
-
- file is the name of the file to be searched
-
- pattern is the pattern to search for
- spaces can be embedded in pattern by using @n, where n
- is a single digit that is the number of spaces to insert.
- Ex: "A@3B" on the command line becomes "A B" in the program.
-
-
- }
-
- Const
- MAXALG = 7; { Maximum algorithm code. Number of algorithms
- implemented = MAXALG + 1, since they are numbered
- from zero. }
- MAXSTR = 90; { Maximum number of characters in various strings }
- ASCIIlen = 255; { Number of characters in the extended ASCII set }
- MAXLINE = 650; { Maximum number of lines to read from input file }
- TOTTIMES = 10; { Total number of times to search the text }
-
-
- Type
- Line = string[MAXSTR];
- Del1Array = Array[0..ASCIIlen] of Integer; { Array type for Boyer-Moore
- delta1 table }
- Del2Array = Array[0..MAXSTR] of Integer; { Array type for Boyer-Moore
- delta2 table }
-
-
-
- Var
- LARGE : Integer; { Arbitrarily large value for FAST/SLOW versions }
- TargetName : Line; { Name of file containing text to search }
- TargetVar : Text; { File variable for the file to search }
- NumLines : Integer; { Number of lines in TargetName }
- TargetList : array[1..MAXLINE] of Line; { Array of strings to search }
- Pattern : Line; { The pattern to search for }
- Algorithm : Integer; { The algorithm to use }
- TimesCount : Integer; { Number of times text has been searched thus far }
- BegTime : Real; { Starting time for the search, in seconds }
- NumFound : Integer; { Number of lines in TargetFile containing Pattern }
- D1Count : Real; { Number of times delta1 table used }
- D2Count : Real; { Number of times delta2 table used }
- Skip1Count : Real; { Aggregate skip due to delta1 }
- Skip2Count : Real; { Aggregate skip due to delta2 }
-
-
-
- {------------------------------------------------------------------------------
- Procedure expand
- expand escape sequences in pattern to blanks
- -------------------------------------------------------------------------------
-
- ENTRY:
- CmdPat is the pattern from the command line. It may have
- escape sequences to be expanded to blanks.
-
-
- EXIT:
- Pattern is the CmdPat with escape sequences replaced with blanks.
-
- NOTE:
- An escape sequence to be replaced with blanks is an "@" followed
- by a single digit. Two "@"'s are reduced to single "@". An "@"
- followed by any other character is left alone.
- Examples:
-
- CmdPat: ABC@3XYZ CmdPat: ABC@@3XYZ CmdPat: ABC@XYZ
- Pattern: ABC XYZ Pattern: ABC@3XYZ Pattern: ABC@XYZ
-
- ------------------------------------------------------------------------------}
-
- procedure expand(CmdPat: Line; var Pattern: Line);
-
- var
- state : Integer; { The state of the pattern blank expander. }
- numblank : Integer; { The number of blanks the escape sequence expands to. }
- Code : Integer; { Return code from the Val function. }
- pj : Char; { The current character of pattern from command line. }
- i,j : Integer; { Loop counters. }
-
-
- begin { procedure expand }
-
- Pattern := '';
-
- state := 0;
- for j := 1 to Length(CmdPat) do
- begin
- pj := CmdPat[j];
-
- {
- STATE 0
- In an escape sequence. If the character is not an "@"
- then copy it to Pattern. If it is an "@", then change to STATE 1,
- in an escape sequence.
- }
- if (state = 0) then
- begin
- if (pj <> '@') then
- begin
- Pattern := Pattern + pj;
- end
- else
- begin
- state := 1;
- end;
- end { state 0 }
-
- {
- STATE 1
- In an escape sequence. If the character is a digit, append that
- number of blanks to the end of Pattern. If it is not a digit and
- not a blank, append an "@" (the previous character from CmdPat) and
- then the character. If the character is an "@" then there have been
- two "@"'s in a row. Append a single "@" to Pattern, since two
- consecutive "@"'s reduce to a single "@".
- }
- else
- begin
- if (pj <> '@') then
- begin
- val(pj, numblank, code);
- if (code = 0) then
- begin
- { A digit. Append that many blanks to end of Pattern }
- for i := 1 to numblank do Pattern := Pattern + ' ';
- end
- else
- begin
- { Not a digit or "@", append "@" (previous char) and current char }
- Pattern := Pattern + '@' + pj;
- end;
- state := 0;
- end
- else
- begin
- { An "@". Two "@"'s reduce to a single "@". }
- Pattern := Pattern + pj;
- state := 0;
- end;
-
- end; { state 1 }
-
- end; { for i }
-
-
-
- end; { procedure expand }
-
-
-
- {------------------------------------------------------------------------------
- Get the problem parameters from the command line.
- -------------------------------------------------------------------------------
-
-
- ENTRY:
- The command should contain 3 parameters:
- An algorithm code, the name of a file to search, a pattern to
- search for.
-
- EXIT:
- Algorithm = 0 for Brute force method
- 1 for Boyer-Moore with just delta1 table
- 2 for Boyer-Moore with fast/slow loop form of delta1
- 3 for Boyer-Moore with both delta1 and delta2 tables
- 4 for B-M with delta1, counting total skips
- 5 for B-M with delta1 and delta2, counting total skips
- 6 Create delta2 table using brute force method.
- 7 Create delta2 table using Rytter's correction to Knuth.
-
-
- TargetName is the name of the file to search.
-
- Pattern is the pattern to search for.
-
- ------------------------------------------------------------------------------}
-
-
- procedure GetProb (var Algorithm: Integer; var TargetName, Pattern: Line);
-
- var
- Code : Integer; { Return code from the Val function. }
-
-
- begin {procedure GetProb}
-
-
-
- if ParamCount <> 3 then
- begin
- WriteLn('Expected 3 parameters, found ', ParamCount);
- WriteLn('Usage: SRCHMEMP algorithm file pattern');
- WriteLn(' algorithm = 0, brute force');
- WriteLn(' algorithm = 1, Boyer-Moore with just delta1');
- WriteLn(' algorithm = 2, Boyer-Moore with fast/slow loop form of delta1');
- WriteLn(' algorithm = 3, Boyer-Moore with both delta1 and delta2 tables');
- WriteLn(' algorithm = 4, Boyer-Moore with delta1, counting total skips');
- WriteLn(' algorithm = 5, Boyer-Moore with delta1 and delta2, counting total skips');
- WriteLn(' algorithm = 6, Create delta2 table using brute force method.');
- WriteLn(' algorithm = 7, Create delta2 table using Rytter''s correction to Knuth.');
- WriteLn(' file is the name of the file to be searched');
- WriteLn(' pattern is the pattern to search for');
- WriteLn(' spaces can be embedded in pattern by using @n, where n');
- WriteLn(' is a single digit that is the number of spaces to insert.');
- WriteLn(' Ex: "A@3B" on the command line becomes "A B" in the program.');
- Halt;
- end;
-
- Val(ParamStr(1), Algorithm, Code);
- if (Code <> 0) or (Algorithm < 0) or (Algorithm > MAXALG) then
- begin
- WriteLn('The algorithm parameter is invalid.');
- WriteLn('It should be a number between 0 and ', MAXALG, '.');
- Halt;
- end;
-
- TargetName := ParamStr(2);
-
- expand(ParamStr(3), Pattern);
-
- end; {procedure GetProb}
-
-
- {------------------------------------------------------------------------------
- Get the current time from MS-DOS, in seconds since the beginning of the day.
- -------------------------------------------------------------------------------
-
- ENTRY:
-
- EXIT:
- A real return value that is the number of seconds from the
- beginning of the day until now.
-
- ------------------------------------------------------------------------------}
-
- function GetTime :Real;
-
-
- type
- Regpack = record
- AX, BX, CX, DX, BP, SI, DI, ES, Flags: Integer;
- end;
-
- var
- Regs : Regpack;
-
- begin { procedure GetTime }
-
- with Regs do
- begin
- AX := $2C00; { Place DOS system request for time into register AX }
- MsDos(Regs);
-
- {
- Time in seconds is hours times 3600 seconds per hour +
- minutes times 60 seconds per minute + seconds + hundreths of seconds
- }
- GetTime := hi(CX)*3600.0 + lo(CX)*60.0 + hi(DX) + lo(DX)/100.0;
- end; { with Regs}
-
-
- end; { procedure GetTime }
-
-
- {------------------------------------------------------------------------------
- Find position of 1st occurrence of Pattern in Target
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the string to be searched.
- Pattern is the pattern to search for.
-
- Target and Pattern are declared as call by reference (var) to
- avoid having them copied on each call to PosBF.
-
-
- EXIT:
- An integer return value that is the position of the 1st
- occurrence of Pattern in Target. If Pattern does not occur
- in Target, the return value is zero.
-
- ------------------------------------------------------------------------------}
-
- function PosBF(var Target, Pattern: Line) :Integer;
-
-
- Label
- NOTFOUND;
-
- Var
- Stop: Integer; {The last position in Target to look for Pat}
- i: Integer; {The current position in Target for the search}
- j: Integer; {The current position in Pat being compared to the
- corresponding position in Target}
- Patlen: Integer; {The length of Pat}
-
-
- begin {function PosBF}
-
-
- {
- Stop is the last position in Target to look for Pattern. Positions
- past Stop are ignored because they would mean Pattern extends past
- the end of Target. This includes the case when Pattern is longer
- than Target. In that case, the for loop does zero iterations,
- because Stop is less than 1, and a result of zero is returned.
- }
- Patlen := Length(Pattern);
- Stop := Length(Target) - Patlen + 1;
-
-
- {
- Test each character in Target as the starting position of Pattern.
- }
- for i := 1 to Stop do
- begin
-
- {
- Check for Pattern starting at position i of Target.
- Keep looking as long as characters in Pattern match
- characters in Target. If all match, we are done.
- If a mismatch is found, then quit looking for Pattern
- at position i of Target and look for it at postion i+1.
- }
- j := 1;
- while (Pattern[j] = Target[j + (i-1)]) do
- begin
- if j < Patlen then
- begin
- j := j + 1;
- end
- else
- begin
- PosBF := i;
- Exit;
- end;
- end; { while Pattern = Target }
-
- end; { for i }
-
- {
- Pattern has not been found anywhere in Target.
- }
- PosBF := 0;
-
- end; {function PosBF}
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the brute force method.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
-
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- ------------------------------------------------------------------------------}
-
- procedure DoBruteForce(var NumFound: Integer);
-
- var
- PBF : Integer; { Position of Pattern in Target as returned by posBF }
- i : Integer; { Loop counter }
-
-
- begin {procedure DoBruteForce}
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
-
- for i := 1 to NumLines do
- begin
- PBF := posBF(TargetList[i], Pattern);
- If PBF > 0 then
- begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
-
- end; { for TimesCount }
-
- end; {procedure DoBruteForce}
-
- {------------------------------------------------------------------------------
- Create a delta1 table for Pattern, for the Boyer-Moore string search method.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Pattern is the pattern for which the table is contructed.
-
- EXIT:
- Delta1, a table that contains the distance of each extended
- ASCII character from the right end of Pattern.
-
- The table assigns to each extended ASCII character its distance from
- the right end of the pattern. If a character is not in the pattern
- then its distance is the length of the pattern. If a character appears
- more than once in the pattern, then its shortest distance is used.
-
- Example:
- Pattern := 'ASAP'
- Delta1['P'] := 0
- Delta1['A'] := 1
- Delta1['S'] := 2
- Delta1 := 4 for all other characters
-
- ------------------------------------------------------------------------------}
-
- procedure MakeDelta1(Pattern: Line; var Delta1: Del1Array);
-
-
- var
- Patlen : Integer; { The pattern length }
- i : Integer; { a loop counter }
-
- begin { procedure MakeDelta1 }
-
- Patlen := Length(Pattern);
-
- {
- Initialize the table to the pattern length.
- }
- for i := 1 to ASCIIlen do
- begin
- Delta1[i] := Patlen;
- end;
-
- {
- Scan the pattern from the left. For each character in Pattern,
- store in Delta1 its position from the right end of Pattern.
- }
- for i := 1 to Patlen do
- begin
- Delta1[Ord(Pattern[i])] := Patlen - i;
- end;
-
- end; { procedure MakeDelta1 }
-
-
- {------------------------------------------------------------------------------
- search a string for a pattern using the Boyer-Moore method with just delta1
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the string to search within.
- Pattern is the string to search for.
- Delta1 is the Boyer-Moore delta1 table. It contains the distance
- of each extended ASCII character from the right end of Pattern.
- Characters that do not appear in Pattern have distance Patlen.
-
- Target, Pattern and Delta1 are declared as call by reference (var) to
- avoid having them copied on each call to PosDelta1.
-
-
- EXIT:
- An integer return value which indicates the starting position of Pattern
- in Target. It is zero if Pattern is not found.
-
- ------------------------------------------------------------------------------}
-
- function posDelta1(var Target, Pattern: Line; var Delta1: Del1Array): Integer;
-
- var
- Trglen : Integer; { The length of Target }
- Patlen : Integer; { The length of Pattern }
- i, j : Integer; { Loop counters }
- oldi : Integer; { Character of Target compared with right end of
- Pattern. Used if Delta1 would cause a backward
- shift. }
-
- begin { function posDelta1 }
-
- Trglen := Length(Target);
- Patlen := Length(Pattern);
-
-
-
- {
- The outer loop. This positions Pattern in Target
- for the next set of character comparisons. Note that if
- Pattern is longer than Target (Patlen > Trglen) then the
- outer loop is never executed and a value of zero is returned.
- }
- i := Patlen;
- while (i <= Trglen) do
- begin
- j := Patlen;
- oldi := i;
-
- {
- Inner loop. Compare Pattern to Target, starting at the right end of
- Pattern (the essence of Boyer-Moore). If characters compare, decrement
- pointers and compare next pair of characters. If we have decremented
- to the 1st character of Pattern we are done and return the position
- in string where the match occurred.
- }
-
- while (Target[i] = Pattern[j]) do
- begin
- if (j > 1) then
- begin
- j := j - 1;
- i := i - 1;
- end
- else
- begin
- posDelta1 := i;
- Exit;
- end;
-
- end; { while (Target[i] = Pattern[j]) }
-
- {
- Nonmatching characters found. Use delta1 table to compute number
- of characters to slide Pattern along Target. If Delta1 causes a backward
- shift, then slide Pattern by 2 characters (it turns out that the minimum
- shift is 2, not 1).
- }
- i := i + Delta1[Ord(Target[i])];
- if (i <= oldi) then i := oldi + 2;
-
- end; { while (i <= Trglen) }
-
- posDelta1 := 0;
-
- end; { function posDelta1 }
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the Boyer-Moore method
- with just the delta1 table.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- ------------------------------------------------------------------------------}
-
- procedure DoDelta1(var NumFound: Integer) ;
-
-
- var
- Delta1 : Del1Array; { The delta 1 table }
- i : Integer; { Loop counter }
- PD1 : Integer;
-
- begin {procedure DoDelta1}
-
- {
- Create the delta1 table.
- }
- MakeDelta1(Pattern, Delta1);
-
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
-
- for i := 1 to NumLines do
- begin
- PD1 := posDelta1(TargetList[i], Pattern, Delta1);
-
- If PD1 > 0 then
- Begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
-
- end; { for TimesCount }
-
- end; {procedure DoDelta1}
-
-
-
- {------------------------------------------------------------------------------
- Search a string for a pattern using the Boyer-Moore method with just delta1
- Use the FAST-SLOW loop structure.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the string to search within.
- Pattern is the string to search for.
- Delta1 is the Boyer-Moore delta1 table. It contains the distance
- of each extended ASCII character from the right end of Pattern.
- Characters that do not appear in Pattern have distance Patlen.
-
- Target, Pattern and Delta1 are declared as call by reference (var) to
- avoid having them copied on each call to PosDelta1.
-
-
- EXIT:
- An integer return value which indicates the starting position of Pattern
- in Target. It is zero if Pattern is not found.
-
- ------------------------------------------------------------------------------}
-
- function posDelta1F(var Target, Pattern: Line; var Delta1: Del1Array): Integer;
-
- label
- FAST, SLOW; { Top of fast loop, top of slow loop }
-
- var
- Trglen : Integer; { The length of Target }
- Patlen : Integer; { The length of Pattern }
- i, j : Integer; { Pointers to Target and Pattern }
- oldi : Integer; { Character of Target compared with right end of
- Pattern. Used if Delta1 would cause a backward
- shift. }
- d1 : Integer; { Delta1 value when mismatch occurrs in slow loop. }
-
-
- begin { function posDelta1F }
-
- Trglen := Length(Target);
- Patlen := Length(Pattern);
- if (Patlen > Trglen) then
- begin
- posDelta1F := 0;
- Exit;
- end;
-
- i := Patlen;
-
-
- {
- The fast loop. This loop compares the right end of Pattern to the
- corresponding character in Target. If they don't match, we skip
- according to the delta1 table and do it again. If they do match,
- go to the slow loop.
- }
- FAST:
- i := i + Delta1[ord(Target[i])];
- if (i <= Trglen) then goto FAST;
-
-
- {
- If we have really skipped passed the end of Target, return.
- }
- if (i <= LARGE) then
- begin
- posDelta1F := 0;
- Exit;
- end;
-
- {
- Undo the large skip caused by the match in the fast loop.
- }
- i := (i - LARGE) - 1;
- oldi := i + 1;
- j := Patlen - 1;
-
- SLOW:
- if (j = 0) then
- begin
- posDelta1F := i + 1;
- Exit;
- end;
-
- if (Target[i] = Pattern[j]) then
- begin
- j := j - 1;
- i := i - 1;
- goto SLOW;
- end;
-
- {
- A mismatch has been found. Skip by delta1, unless this is really a backward
- skip. The skip is backward if the right end of Pattern is moved to a position
- in Target to the left of its previous position in Target - i+skip < oldi.
- Also, if the character in Target where the mismatch occured is the same
- as the character on the right end of Pattern, then skip is a large number,
- but really corresponds to sliding the Pattern backwards so its right end
- lines up with the character in Target that caused the mismatch. In either
- case we can move Pattern forward by 2 (of course we can always move forward
- by at least 1, but it can be proved that the minimum skip is 2).
- }
- d1 := Delta1[ord(Target[i])];
- if (d1 <> LARGE) and (i + d1 > oldi) then
- begin
- i := i + d1;
- end
- else
- begin
- i := oldi + 2;
- end;
-
- {
- Check for skipping past end of Target. If not, then go to the fast loop.
- Else Pattern is not in Target, return with a zero result.
- }
- if (i <= Trglen) then
- begin
- goto FAST;
- end
- else
- begin
- posDelta1F := 0;
- end;
-
-
-
- end; { function posDelta1F }
-
-
-
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the Boyer-Moore method
- with just the delta1 table. This version uses the FAST-SLOW loops.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- ------------------------------------------------------------------------------}
-
- procedure DoDelta1F(var NumFound: Integer) ;
-
-
- var
- Delta1 : Del1Array; { The delta 1 table }
- patlen : Integer; { Length of Pattern }
- i : Integer; { Loop counter }
- PD1F : Integer;
-
- begin {procedure DoDelta1F}
-
- {
- Create the delta1 table. Replace the entry corresponding to the right
- of Pattern with a value larger than pattern length + target length.
- This is used to detect a match in the fast loop.
- }
- MakeDelta1(Pattern, Delta1);
- patlen := length(Pattern);
- Delta1[ord(Pattern[patlen])] := LARGE;
-
-
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
-
- for i := 1 to NumLines do
- begin
- PD1F := posDelta1F(TargetList[i], Pattern, Delta1);
- If PD1F > 0 then
- Begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
-
- end; { for TimesCount }
-
- end; {procedure DoDelta1F}
-
-
- {------------------------------------------------------------------------------
- Compare 2 substrings of Pattern.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Pattern is the pattern
-
- s1 is the starting position of the first substring.
- The first substring extends from character s1 of Pattern to
- the end of pattern.
-
- s2 is the starting position of the 2nd substring.
- The 2nd substring starts at s2 and is as long as the first
- substring.
-
- EXIT:
- A boolean result. True if the 2 substrings match, False otherwise.
-
- NOTE:
- This function is used to compare 2 substrings of a search pattern
- as part of the process of creating a delta 2 table for the Boyer-
- Moore string searching algorithm.
-
- The 1st substring extends from postion s1 of Pattern to the end of
- Pattern. The 2nd substring may be anywhere to the left of the 1st
- one. The 2nd substring may even "fall off" the left end of Pattern.
- Characters which fall off the left end are ignored (ie. considered
- to match). Thus the 2 substrings are compared starting from their
- rightmost characters and comparing up to the point where the 2nd
- substring falls off the left end of Pattern.
-
- ------------------------------------------------------------------------------}
-
- function cmpsubs(Pattern : Line; s1, s2: Integer): Boolean;
-
- var
- Patlen : Integer; { Length of Pattern }
- sublen : Integer; { Length of 1st substring }
- i : Integer; { Right end of 1st substring }
- j : Integer; { Right end of 2nd substring }
- count : Integer; { Counter for number of characters to compare }
-
- begin { function cmpsubs }
-
- Patlen := Length(Pattern);
- sublen := Patlen - (s1 - 1);
-
-
- i := patlen;
-
-
- {
- Define j, the right end of the 2nd substring.
- If the 2nd substring doesn't fall off the left end of the pattern,
- then j is simply sublen characters to the right of s2. If it does
- fall off the left end, then use the rightmost characters of the 2nd
- substring that don't fall off the left end. If the entire 2nd substring
- falls off the left, then return TRUE.
- }
- if (s2 >= 1) then
- begin
- j := (s2 - 1) + sublen;
- end
- else
- begin
- sublen := sublen + (s2 - 1);
- if (sublen = 0) then
- begin
- cmpsubs := TRUE;
- EXIT;
- end;
- j := sublen;
- end;
-
- {
- Compare the 2 substrings.
- }
- count := 0;
- while (count < sublen) do
- begin
- if (Pattern[j] = Pattern[i]) then
- begin
- i := i - 1;
- j := j - 1;
- count := count + 1;
- end
- else
- begin
- cmpsubs := FALSE;
- EXIT;
- end;
- end; { while count < sublen }
-
- {
- If we have made it this far then the substrings match.
- Now we check the character to the immediate left of each substring.
- If they are not equal we have a match and return TRUE. If they are
- equal we don't have a match and return FALSE.
-
- We must also take into account the situation where the character to the
- left of the 2nd substring falls off the left end of Pattern. This always
- matches.
- }
- if (s2 > 1) then
- begin
- if (Pattern[j] = Pattern[i]) then
- begin
- cmpsubs := FALSE;
- end
- else
- begin
- cmpsubs := TRUE;
- end;
- end
- else
- begin
- cmpsubs := TRUE;
- end;
-
- end; { function cmpsubs }
-
-
- {------------------------------------------------------------------------------
- Create a delta2 table using brute force methods.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Pattern is the pattern for which the table is contructed.
-
-
- EXIT:
- Delta2 is the delta 2 table.
-
- ------------------------------------------------------------------------------}
-
- procedure MakeDelta2BF(Pattern: Line; var Delta2: Del2Array);
-
-
- var
- Patlen : Integer; { The length of Pattern }
- i : Integer; { Position in Pattern for which delta2 value is computed }
- j : Integer; { Position in Pattern to check for a match with
- terminating substring starting at position i+1. }
-
-
- begin { procedure MakeDelta2BF }
-
- Patlen := Length(Pattern);
-
-
- {
- Determine delta 2 value for character i of Pattern.
- }
- for i := 1 to Patlen do
- begin
- {
- Compare substring at position i+1 with substring of same length
- starting at position j. If no match is found, decrement j and
- try again. Repeat until a match is found. A match is guaranteed
- because the compare defines characters of the 2nd substring which
- fall off the left end of Pattern to match.
- }
- j := i;
- while (not cmpsubs(Pattern,i+1,j)) do
- begin
- j := j - 1;
- end; { while not cmpsubs }
- Delta2[i] := Patlen + 1 - j;
- end; { for i }
-
- end; { procedure MakeDelta2BF }
-
-
-
- {------------------------------------------------------------------------------
- Procedure MakeDelta2WR
- Create a delta 2 table using the Rytter correction to Knuth's algorithm.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Pattern is the pattern for which the table is contructed.
-
- EXIT:
- Delta2 is the delta 2 table.
-
- NOTE: See the following reference for an explanation of this method
- of constructing the delta 2 table.
-
- ------------------------------------------------------------------------------}
-
- procedure MakeDelta2WR(Pattern: Line; var Delta2: Del2Array);
-
- var
- k : Integer;
- patlen : Integer; { Length of Pattern. }
- patlen2 : Integer; { Twice the length of Pattern. }
- j : Integer;
- t : Integer;
- f : Array[1..MAXSTR] of Integer;
- f1 : Array[1..MAXSTR] of Integer;
- q : Integer;
- q1 : Integer;
- j1 : Integer;
- t1 : Integer;
-
-
- begin { procedure MakeDelta2WR }
-
- patlen := Length(pattern);
-
- {
- Section labeled A1 in Rytter's paper.
- }
- patlen2 := 2*patlen;
- for K := 1 to patlen do
- begin
- Delta2[k] := patlen2 - k;
- end;
-
- {
- Section labeled A2 in Rytter's paper.
- }
- j := patlen;
- t := patlen + 1;
- while (j > 0) do
- begin
- f[j] := t;
- while (t <= patlen) and (Pattern[j] <> Pattern[t]) do
- begin
- if (patlen - j < Delta2[t]) then Delta2[t] := patlen - j;
- t := f[t];
- end; { while (t <= patlen) and (Pattern[j] <> Pattern[t]) }
- t := t - 1;
- j := j - 1;
- end; { while (j > 0) }
-
-
- q := t;
- t := patlen + 1 - q;
- q1 := 1;
-
-
- {
- Section labeled B1 in Rytter's paper.
- }
- j1 := 1;
- t1 := 0;
- while (j1 <= t) do
- begin
- f1[j1] := t1;
- while (t1 >= 1) and (Pattern[j1] <> Pattern[t1]) do
- begin
- t1 := f1[t1];
- end; { while (t1 >= 1) and (Pattern[j1] <> Pattern[t1]) }
- t1 := t1 + 1;
- j1 := j1 + 1;
- end; { while (j1 <= t) }
-
- {
- Section labeled B2 in Rytter's paper,
- }
- while (q < patlen) do
- begin
- for k := q1 to q do
- begin
- if (patlen + q -k < Delta2[k]) then Delta2[k] := patlen + q - k;
- end; { for k }
- q1 := q + 1;
- q := q + t - f1[t];
- end; { while (q < patlen) }
-
-
- end; { procedure MakeDelta2WR }
-
-
-
- {------------------------------------------------------------------------------
- Procedure posDelta2F
- Boyer-Moore with delta1 and delta2 tables.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the text string to be searched
-
- Pattern is the string to search for
-
- Del1Array is the Boyer-Moore delta1 table
-
- Del2Array is the Boyer-Moore delta2 tabel
-
-
- Target, Pattern and Delta1 are declared as call by reference (var) to
- avoid having them copied on each call to PosDelta2.
-
-
- EXIT:
- An integer return value which indicates the starting position of Pattern
- in Target. It is zero if Pattern is not found.
-
- ------------------------------------------------------------------------------}
-
- function posDelta2F(var Target, Pattern : Line; var Delta1 : Del1Array;
- var Delta2 : Del2Array): Integer;
-
-
-
- label
- FAST, SLOW; { Top of fast loop, top of slow loop }
-
- var
- Trglen : Integer; { The length of Target }
- Patlen : Integer; { The length of Pattern }
- i, j : Integer; { Pointers to Target and Pattern }
- d1 : Integer; { Delta1 value when mismatch occurrs in slow loop. }
-
- begin { function posDelta2F }
-
- Trglen := Length(Target);
- Patlen := Length(Pattern);
- if (Patlen > Trglen) then
- begin
- posDelta2F := 0;
- Exit;
- end;
-
- i := Patlen;
-
-
- {
- The fast loop. This loop compares the right end of Pattern to the
- corresponding character in Target. If they don't match, we skip
- according to the delta1 table and do it again. If they do match,
- go to the slow loop.
- }
- FAST:
- i := i + Delta1[ord(Target[i])];
- if (i <= Trglen) then goto FAST;
-
-
- {
- If we have really skipped passed the end of Target, return.
- }
- if (i <= LARGE) then
- begin
- posDelta2F := 0;
- Exit;
- end;
-
- {
- Undo the large skip caused by the match in the fast loop.
- }
- i := (i - LARGE) - 1;
- j := Patlen - 1;
-
- SLOW:
- if (j = 0) then
- begin
- posDelta2F := i + 1;
- Exit;
- end;
-
- if (Target[i] = Pattern[j]) then
- begin
- j := j - 1;
- i := i - 1;
- goto SLOW;
- end;
-
- {
- A mismatch has been found.
- The delta1 value must be adjusted when it is "large". In this case
- the character in Target which caused the mismatch is the same as the
- character at the right end of Pattern. The usual delta1 value for this
- character is 0.
-
- Then choose the larger of the delta1 and delta2 values.
- }
- d1 := Delta1[ord(Target[i])];
- if (d1 = LARGE) then d1 := 0;
-
- if (d1 > Delta2[j]) then
- begin
- i := i + d1
- end
- else
- begin
- i := i + Delta2[j];
- end;
-
- {
- Check for skipping past end of Target. If not, then go to the fast loop.
- Else Pattern is not in Target, return with a zero result.
- }
- if (i <= Trglen) then
- begin
- goto FAST;
- end
- else
- begin
- posDelta2F := 0;
- end;
-
-
- end; { function posDelta2F }
-
-
-
- {------------------------------------------------------------------------------
- Procedure ShowDelta2
- Display the pattern with the delta2 values underneath it.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Pattern is the pattern being searched for.
-
- Delta2 is the Boyer-Moore delta2 table for Pattern.
-
- EXIT:
- The pattern is displayed with the delta2 value for each
- character of pattern displayed underneath the character:
-
- A B C X X X A B C
- 14 13 12 11 10 9 11 10 1
-
-
- ------------------------------------------------------------------------------}
-
- procedure ShowDelta2(Pattern : Line; Delta2 : Del2Array);
-
- var
- patlen : Integer; { The length of Pattern. }
- i : Integer;
-
-
- begin { procedure ShowDelta2 }
-
- patlen := Length(Pattern);
-
- for i := 1 to patlen do
- begin
- Write(' ', Pattern[i]);
- end;
- WriteLn;
- for i := 1 to patlen do
- begin
- Write(' ', Delta2[i]:2);
- end;
- WriteLn;
-
- end; { procedure ShowDelta2 }
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the Boyer-Moore method
- with both the delta1 and delta2 tables.
- This version uses the FAST-SLOW loops.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- ------------------------------------------------------------------------------}
-
- procedure DoDelta2F(var NumFound: Integer) ;
-
-
- VAR
- DELTA1 : DEL1ARRAY; { THE DELTA 1 TABLE }
- DELTA2 : DEL2ARRAY; { THE DELTA 2 TABLE }
- patlen : Integer; { Length of Pattern }
- i : Integer; { Loop counter }
- PD2 : Integer;
-
- begin {procedure DoDelta2F}
-
- {
- Create the delta1 table. Replace the entry corresponding to the right
- of Pattern with a value larger than pattern length + target length.
- This is used to detect a match in the fast loop.
- }
- MakeDelta1(Pattern, Delta1);
- patlen := length(Pattern);
- Delta1[ord(Pattern[patlen])] := LARGE;
-
- {
- Create the Delta2 table.
- }
- MakeDelta2WR(Pattern, Delta2);
-
-
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
-
- for i := 1 to NumLines do
- begin
- PD2 := posDelta2F(TargetList[i], Pattern, Delta1, Delta2);
- If PD2 > 0 then
- Begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
-
- end; { for TimesCount }
-
- end; {procedure DoDelta2F}
-
-
-
- {------------------------------------------------------------------------------
- Search a string for a pattern using the Boyer-Moore method with just delta1
- Use the FAST-SLOW loop structure. Count delta1 and other skips
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the string to search within.
- Pattern is the string to search for.
- Delta1 is the Boyer-Moore delta1 table. It contains the distance
- of each extended ASCII character from the right end of Pattern.
- Characters that do not appear in Pattern have distance Patlen.
-
- Target, Pattern and Delta1 are declared as call by reference (var) to
- avoid having them copied on each call to PosDelta1.
-
-
- EXIT:
- An integer return value which indicates the starting position of Pattern
- in Target. It is zero if Pattern is not found.
-
- D1Count is the number of times the Delta1 table was used.
-
- D2Count is the number of times a fixed skip of 2 was used.
-
- Skip1Count is the sum of the delta1 skips used.
-
- Skip2Count is the sum of the fixed skips used.
-
- ------------------------------------------------------------------------------}
-
- function posDelta1C(var Target, Pattern: Line; var Delta1: Del1Array;
- var D1Count, D2Count, Skip1Count, Skip2Count: Real):
- Integer;
-
- label
- FAST, SLOW; { Top of fast loop, top of slow loop }
-
- var
- Trglen : Integer; { The length of Target }
- Patlen : Integer; { The length of Pattern }
- i, j : Integer; { Pointers to Target and Pattern }
- oldi : Integer; { Character of Target compared with right end of
- Pattern. Used if Delta1 would cause a backward
- shift. }
- d1 : Integer; { Delta1 value when mismatch occurrs in slow loop. }
-
-
- begin { function posDelta1C }
-
- Trglen := Length(Target);
- Patlen := Length(Pattern);
- if (Patlen > Trglen) then
- begin
- posDelta1C := 0;
- Exit;
- end;
-
- i := Patlen;
-
-
- {
- The fast loop. This loop compares the right end of Pattern to the
- corresponding character in Target. If they don't match, we skip
- according to the delta1 table and do it again. If they do match,
- go to the slow loop.
- Also, count increment the delta1 counters. Note that because
- we are at the right end of Pattern the pattern shift is the
- same as the delta1 value.
- }
- FAST:
- D1Count := D1Count + 1;
- Skip1Count := Skip1Count + Delta1[ord(Target[i])];
- i := i + Delta1[ord(Target[i])];
- if (i <= Trglen) then goto FAST;
-
-
- {
- If we have really skipped passed the end of Target, return.
- }
- if (i <= LARGE) then
- begin
- posDelta1C := 0;
- Exit;
- end;
-
- {
- Undo the large skip caused by the match in the fast loop.
- }
- D1Count := D1Count - 1;
- Skip1Count := Skip1Count - LARGE;
- i := (i - LARGE) - 1;
- oldi := i + 1;
- j := Patlen - 1;
-
-
- SLOW:
- if (j = 0) then
- begin
- posDelta1C := i + 1;
- Exit;
- end;
-
- if (Target[i] = Pattern[j]) then
- begin
- j := j - 1;
- i := i - 1;
- goto SLOW;
- end;
-
- {
- A mismatch has been found. skip by delta1, unless this is really a backward
- skip. The skip is backward if the right end of Pattern is moved to a position
- in Target to the left of its previous position in Target - i+skip < oldi.
- Also, if the character in Target where the mismatch occured is the same
- as the character on the right end of Pattern, then skip is a large number,
- but really corresponds to sliding the Pattern backwards so its right end
- lines up with the character in Target that caused the mismatch. In either
- case we can move Pattern forward by 2 (of course we can always move forward
- by at least 1, but it can be proved that the minimum skip is 2).
-
- Also, increment the appropriate delta counters. If delta1 is used, the
- actual pattern shift depends on the delta1 value and the current position
- in Pattern. If the delta1 value is not used, the shift is fixed at 2.
- }
- d1 := Delta1[ord(Target[i])];
- if (d1 <> LARGE) and (i + d1 > oldi) then
- begin
- D1Count := D1Count + 1;
- Skip1Count := Skip1Count + d1 - patlen + j;
- i := i + d1;
- end
- else
- begin
- D2Count := D2Count + 1;
- Skip2Count := Skip2Count + 2;
- i := oldi + 2;
- end;
-
- {
- Check for skipping past end of Target. If not, then go to the fast loop.
- Else Pattern is not in Target, return with a zero result.
- }
- if (i <= Trglen) then
- begin
- goto FAST;
- end
- else
- begin
- posDelta1C := 0;
- end;
-
-
-
- end; { function posDelta1C }
-
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the Boyer-Moore method
- with just the delta1 table. This version uses the FAST-SLOW loops.
- Count the number of delta1 and other skips.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- D1Count is the number of times the Delta1 table was used.
-
- Skip1Count is the sum of the delta1 skips used.
-
- D2Count is the number of times a fixed skip of 2 was used.
-
- Skip2Count is the sum of the fixed skips used.
-
- ------------------------------------------------------------------------------}
-
- procedure DoDelta1C(var NumFound: Integer;
- var D1Count, D2Count, Skip1Count, Skip2Count: Real);
-
- var
- Delta1 : Del1Array; { The delta 1 table }
- patlen : Integer; { Length of Pattern }
- i : Integer; { Loop counter }
- PD1C : Integer;
-
- begin {procedure DoDelta1C}
-
- {
- Create the delta1 table. Replace the entry corresponding to the right
- of Pattern with a value larger than pattern length + target length.
- This is used to detect a match in the fast loop.
- }
- MakeDelta1(Pattern, Delta1);
- patlen := length(Pattern);
- Delta1[ord(Pattern[patlen])] := LARGE;
-
- for i := 1 to NumLines do
- begin
- PD1C := posDelta1C(TargetList[i], Pattern, Delta1, D1Count, D2count,
- Skip1Count, Skip2Count);
- If PD1C > 0 then
- Begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
- end; {procedure DoDelta1C}
-
-
-
- {------------------------------------------------------------------------------
- Procedure posDelta2C
- Boyer-Moore with delta1 and delta2 tables. Count delta1 and delta2 skips.
- -------------------------------------------------------------------------------
-
- ENTRY:
- Target is the text string to be searched
-
- Pattern is the string to search for
-
- Del1Array is the Boyer-Moore delta1 table
-
- Del2Array is the Boyer-Moore delta2 tabel
-
-
- Target, Pattern and Delta1 are declared as call by reference (var) to
- avoid having them copied on each call to PosDelta2.
-
-
- EXIT:
- An integer return value which indicates the starting position of Pattern
- in Target. It is zero if Pattern is not found.
-
- D1Count is the number of times the Delta1 table was used.
-
- D2Count is the number of times a fixed skip of 2 was used.
-
- Skip1Count is the sum of the delta1 skips used.
-
- Skip2Count is the sum of the fixed skips used.
-
- ------------------------------------------------------------------------------}
-
- function posDelta2C(var Target, Pattern: Line; var Delta1: Del1Array;
- var Delta2: Del2Array;
- var D1Count, D2Count, Skip1Count, Skip2Count: Real):
- Integer;
-
-
-
-
- label
- FAST, SLOW; { Top of fast loop, top of slow loop }
-
- var
- Trglen : Integer; { The length of Target }
- Patlen : Integer; { The length of Pattern }
- i, j : Integer; { Pointers to Target and Pattern }
- d1 : Integer; { Delta1 value when mismatch occurrs in slow loop. }
-
- begin { function posDelta2C }
-
- Trglen := Length(Target);
- Patlen := Length(Pattern);
- if (Patlen > Trglen) then
- begin
- posDelta2C := 0;
- Exit;
- end;
-
- i := Patlen;
-
-
- {
- The fast loop. This loop compares the right end of Pattern to the
- corresponding character in Target. If they don't match, we skip
- according to the delta1 table and do it again. If they do match,
- go to the slow loop.
- }
- FAST:
- D1Count := D1Count + 1;
- Skip1Count := Skip1Count + Delta1[ord(Target[i])];
- i := i + Delta1[ord(Target[i])];
- if (i <= Trglen) then goto FAST;
-
-
- {
- If we have really skipped passed the end of Target, return.
- }
- if (i <= LARGE) then
- begin
- posDelta2C := 0;
- Exit;
- end;
-
- {
- Undo the large skip caused by the match in the fast loop.
- }
- D1Count := D1Count - 1;
- Skip1Count := Skip1Count - LARGE;
- i := (i - LARGE) - 1;
- j := Patlen - 1;
-
- SLOW:
- if (j = 0) then
- begin
- posDelta2C := i + 1;
- Exit;
- end;
-
- if (Target[i] = Pattern[j]) then
- begin
- j := j - 1;
- i := i - 1;
- goto SLOW;
- end;
-
- {
- A mismatch has been found.
- The delta1 value must be adjusted when it is "large". In this case
- the character in Target which caused the mismatch is the same as the
- character at the right end of Pattern. The usual delta1 value for this
- character is 0.
-
- Then choose the larger of the delta1 and delta2 values.
- }
- d1 := Delta1[ord(Target[i])];
- if (d1 = LARGE) then d1 := 0;
-
- if (d1 > Delta2[j]) then
- begin
- D1Count := D1Count + 1;
- Skip1Count := Skip1Count + d1 - patlen + j;
- i := i + d1
- end
- else
- begin
- D2Count := D2Count + 1;
- Skip2Count := Skip2Count + Delta2[j] - patlen + j;
- i := i + Delta2[j];
- end;
-
- {
- Check for skipping past end of Target. If not, then go to the fast loop.
- Else Pattern is not in Target, return with a zero result.
- }
- if (i <= Trglen) then
- begin
- goto FAST;
- end
- else
- begin
- posDelta2C := 0;
- end;
-
-
- end; { function posDelta2C }
-
-
-
-
- {------------------------------------------------------------------------------
- Search an array of strings for a pattern, using the Boyer-Moore method
- with the delta1 and delta2 tables. This version uses the FAST-SLOW loops.
- Count the number of delta1 and delta2 skips.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetList is an array of strings to search.
- NumLines is the number of elements in Target.
- Pattern is the pattern to search for.
- EXIT:
- Numfound is the number of lines in file TargetName that
- contain Pattern.
-
- D1Count is the number of times the Delta1 table was used.
-
- Skip1Count is the sum of the delta1 skips used.
-
- D2Count is the number of times a fixed skip of 2 was used.
-
- Skip2Count is the sum of the fixed skips used.
-
- ------------------------------------------------------------------------------}
-
- procedure DoDelta2C(var NumFound: Integer;
- var D1Count, D2Count, Skip1Count, Skip2Count: Real);
-
- var
- Delta1 : Del1Array; { The delta 1 table }
- Delta2 : Del2Array; { The delta 2 table }
- patlen : Integer; { Length of Pattern }
- i : Integer; { Loop counter }
- PD2C : Integer;
-
- begin {procedure DoDelta2C}
-
- {
- Create the delta1 table. Replace the entry corresponding to the right
- of Pattern with a value larger than pattern length + target length.
- This is used to detect a match in the fast loop.
- }
- MakeDelta1(Pattern, Delta1);
- patlen := length(Pattern);
- Delta1[ord(Pattern[patlen])] := LARGE;
-
- {
- Create the Delta2 table.
- }
- MakeDelta2WR(Pattern, Delta2);
- ShowDelta2(Pattern, Delta2);
-
- for i := 1 to NumLines do
- begin
- PD2C := posDelta2C(TargetList[i], Pattern, Delta1, Delta2, D1Count, D2count,
- Skip1Count, Skip2Count);
- If PD2C > 0 then
- Begin
- NumFound := NumFound + 1;
- end;
- end; { for i }
-
-
- end; {procedure DoDelta2C}
-
-
-
- {------------------------------------------------------------------------------
- Create the delta2 table several times using the brute force method
- -------------------------------------------------------------------------------
-
- ------------------------------------------------------------------------------}
-
- procedure DoMakD2BF;
-
- var
- Delta2 : Del2Array; { The delta 2 table }
-
- begin {procedure DoMakD2BF}
-
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
- MakeDelta2BF(Pattern, Delta2);
-
- end; { for TimesCount }
-
- ShowDelta2(Pattern, Delta2);
-
- end; {procedure DoMakD2BF}
-
-
-
-
-
-
- {------------------------------------------------------------------------------
- Create the delta2 table several times using the Knuth/Rytter method
- -------------------------------------------------------------------------------
-
- ------------------------------------------------------------------------------}
-
- procedure DoMakD2WR;
-
- var
- Delta2 : Del2Array; { The delta 2 table }
-
- begin {procedure DoMakD2WR}
-
-
- for TimesCount := 1 to TOTTIMES do
- begin
-
- MakeDelta2WR(Pattern, Delta2);
-
- end; { for TimesCount }
-
-
- ShowDelta2(Pattern, Delta2);
-
-
- end; {procedure DoMakD2WR}
-
-
-
-
- {------------------------------------------------------------------------------
- Setup for file search: Get starting time and open target file.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetName is the file to search.
-
- EXIT:
- Begtime contains the starting time for the search. It is in
- seconds since the start of the day.
-
- TargetVar is a file variable for TargetName. TargetName has been
- opened for sequential input.
-
- TargetLines is an array of strings which holds the contents
- of the file TargetName.
-
- NumFound, the number of lines in TargetName which contain Pattern,
- is initialized to zero.
-
- ------------------------------------------------------------------------------}
-
- procedure Setup ;
-
- var
- OK : Boolean;
-
- begin {procedure Setup}
-
-
-
- {
- Open file to be searched for input.
- }
- Assign(TargetVar, TargetName);
- {$I-}
- Reset(TargetVar);
- {$I+}
- OK := (IOresult = 0);
- if (not OK) then
- begin
- WriteLn('Target file ', TargetName, ' not found.');
- Halt;
- end;
-
- {
- Read the file and store its contents in an array of strings.
- }
- NumLines := 0;
- while (not Eof(TargetVar)) and (NumLines < MAXLINE) do
- begin
- NumLines := NumLines + 1;
- ReadLn(TargetVar, TargetList[NumLines]);
- end;
-
- Close(TargetVar);
-
-
- NumFound := 0;
-
- D1Count := 0;
- D2Count := 0;
- Skip1Count := 0;
- Skip2Count := 0;
-
- {
- Initialize elapsed timer.
- }
- BegTime := GetTime;
-
- end; {procedure Setup}
-
-
-
- {------------------------------------------------------------------------------
- Cleanup after the search: Close file. Compute time for search.
- -------------------------------------------------------------------------------
-
- ENTRY:
- TargetVar is the file variable for the file just searched.
-
- Begtime is the starting time for the search in seconds since
- the beginning of the day.
-
- NumFound is the number of lines in the target file containing Pattern.
-
- EXIT:
- The target file is closed.
- The elapsed time for the search, in seconds, is displayed on the
- standard output device.
-
- ------------------------------------------------------------------------------}
-
- procedure Cleanup;
-
- var
- TotTime : Real; { Total time for the search, in seconds }
-
- begin {procedure Cleanup}
-
- Close(TargetVar);
-
- {
- Elapsed time is in seconds.
- }
- TotTime := GetTime - BegTime;
- Write(' Alg: ', Algorithm, ' File: ', TargetName, ' Pattern: /', Pattern,
- '/ Time: ', TotTime:5:1, ' Lines: ', NumFound);
-
- {
- Write counts if counting versions of the algorithms were used.
- }
- if (Algorithm = 4) or (Algorithm = 5) then
- begin
- Write(' D1: ', D1Count:6:0, ' Skip1: ', Skip1Count:6:0, ' D2: ', D2Count:6:0,
- ' Skip2: ', Skip2Count:6:0);
- end;
-
- WriteLn;
-
- end; {procedure Cleanup}
-
-
-
- begin {program Search}
-
- {
- Arbitrarily large constant for FAST/SLOW loop versions.
- Subtract MAXSTR to prevent overflow.
- }
- LARGE := MAXINT - MAXSTR;
-
- GetProb(Algorithm, TargetName, Pattern);
-
- Setup;
-
-
- case Algorithm of
- 0: DoBruteForce(NumFound);
- 1: DoDelta1(NumFound);
- 2: DoDelta1F(NumFound);
- 3: DoDelta2F(NumFound);
- 4: DoDelta1C(NumFound, D1Count, D2Count, Skip1Count, Skip2Count);
- 5: DoDelta2C(NumFound, D1Count, D2Count, Skip1Count, Skip2Count);
- 6: DoMakD2BF;
- 7: DoMakD2WR;
- end; { case Algorithm }
-
-
-
-
-
- Cleanup;
-
-
- end. {program Search}
-
-
-