home *** CD-ROM | disk | FTP | other *** search
- From: cedman@lynx.ps.uci.edu (Carl Edman)
- Newsgroups: alt.sources
- Subject: Re: Fast strcmp() wanted.
- Message-ID: <CEDMAN.90Oct4171710@lynx.ps.uci.edu>
- Date: 5 Oct 90 00:17:10 GMT
- <CEDMAN.90Sep27075013@lynx.ps.uci.edu>
- <1990Sep27.151543.8025@ccs.carleton.ca>
- <CEDMAN.90Sep29091115@lynx.ps.uci.edu> <6003@hplabsz.HPL.HP.COM>
- <1145@exodus.Eng.Sun.COM>
- Organization: non serviam
- Lines: 61
- Nntp-Posting-Host: lynx.ps.uci.edu
- In-reply-to: falk@peregrine.Sun.COM's message of 4 Oct 90 21:03:47 GMT
-
- In article <1145@exodus.Eng.Sun.COM> falk@peregrine.Sun.COM (Ed Falk) writes:
- In article <6003@hplabsz.HPL.HP.COM> sartin@hplabs.hp.com (Rob Sartin) writes:
- >In article <CEDMAN.90Sep29091115@lynx.ps.uci.edu> cedman@lynx.ps.uci.edu (Carl Edman) writes:
- >>string structure (or better class, long live C++ ! :-) which calculates
- >>a 32-bit CRC for each string the first time and stores it somewhere.
- >>Then only 1 (inlined) longword-compare will do the stringcomparisons
- >>for you.
- >
- >Afraid not. It'll give you an estimate of whether the strings match
- >(correctly identifying those that don't). You will need to then
- >actually compare the strings if they are the same. This method will
- >also be unable to reproduce strcmp's behavior (strcmp returns a signed
- >result indicated the <, =, > by being negative, zero, positive), it will
- >only return a boolean (match, no match).
-
- Also, these two strings
-
- "ab\0x"
- "ab\0y"
-
- (where x and y are any garbage that happens to be in memory after the
- terminating '\0') will be evaluated as unequal.
-
- There are workarounds for both problems, of course, but I think there
- won't be much of a performance improvement after you've done all it
- requires to get it right.
-
- Please , take note ! I did NOT suggest that this method is always the
- best, but I gave 3 or 4 different methods, and some hints, about when
- I would use which one.
- And I still hold that the CRC-method could be by far the most
- efficient under some conditions:
-
- - each word is checked often. Then a small overhead like f.e.
- cleaning up the garbage behind the end of the string for the
- one time calculation wouldn't matter
-
- - strings are long, so the overhead of 32-bit per string is
- justified
-
- - you only need to know if 2 strings are identical
-
- - most compares between unequal strings
-
- You can even imagine systems where it could be possible to remove
- the actual string from memory, and simply assume that if the 32-bit
- CRC match, the strings match. Such systems would have to be tolerant
- about an occasional mismatch.
-
- If memory serves correctly the above approach is used in some
- implementations for diff. (only to give one practical, real world
- example)
-
- Carl Edman
-
-
-
- Theorectial Physicist,N.:A physicist whose | Send mail
- existence is postulated, to make the numbers | to
- balance but who is never actually observed | cedman@golem.ps.uci.edu
- in the laboratory. | edmanc@uciph0.ps.uci.edu
-