home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!uunet.ca!ecicrl!clewis
- From: clewis@ferret.ocunix.on.ca (Chris Lewis)
- Newsgroups: comp.lang.perl
- Subject: Re: SOUNDEX pattern matching
- Keywords: soundex, perl
- Message-ID: <4173@ecicrl.ocunix.on.ca>
- Date: 24 Jan 93 09:10:37 GMT
- References: <1993Jan23.184532.4933@netcom.com> <1jsco2INN4dd@slab.mtholyoke.edu> <1993Jan24.010737.5908@netcom.com>
- Organization: Elegant Communications Inc., Ottawa, Canada
- Lines: 53
-
- In article <1993Jan24.010737.5908@netcom.com> jfh@netcom.com (Jack Hamilton) writes:
- >In article <1jsco2INN4dd@slab.mtholyoke.edu> jbotz@mtholyoke.edu (Jurgen Botz) writes:
- >>In article <1993Jan23.184532.4933@netcom.com> jfh@netcom.com (Jack Hamilton) writes:
- >>>Soundex is very useful. It's used almost exclusively for English proper
- >>>names, I believe.
-
- >>Uh, that makes it pretty useless in the USA and elsewhere in the world
- >>except England.
-
- I know that US English isn't the same as English English, but it's not that
- far off ;-)
-
- As one particular example of a name that I don't think is english:
-
- butz -> B3200000
- botz -> B3200000
- buts -> B3200000
- bots -> B3200000
- boods -> B3200000
- booze -> B2000000
-
- Not bad.
-
- >It's not designed to do all things for all people, but a lot of personal
- >names in the USA and England *are* English proper names as well as english
- >proper names (you left out English-speaking Canada, Australia, and New
- >Zealand, along with a host of other places where English is a primary
- >language).
-
- >The algorithm would handle some non-English names correctly (spanish/
- >portuguese Jose, for example), but not all of them (french Louis, for
- >example).
-
- Does "louis" fine - matches "lewis" as it should. Especially the way
- most Americans pronounce "louis" ;-)
-
- You're being unduly harsh on soundex. Soundex isn't supposed to provide
- EXACT sound-alike, it produces "probably sounds reasonably similar".
- Producing exact sound-alike is not something that can be done in a couple
- of lines of code - good "text-to-phoneme" translators (an equivalent
- task) have hundreds or thousands of rules, and will fail almost immediately
- when presented with foreign words with different pronunciation rules.
-
- In contrast, soundex is a simple mechanism that works reasonably well with
- any language that uses something approaching english pronunciation. Ie:
- names transliterated by sound into roman alphabet spellings. Especially
- if you get the first letter right. Though, it's not that hard to add
- some special overrides, like leading "kn" -> "n" - it's bad when you
- handle "knuth" wrong ;-)
- --
- Chris Lewis; clewis@ferret.ocunix.on.ca; Phone: Canada 613 832-0541
- Psroff 3.0 info: psroff-request@ferret.ocunix.on.ca
- Ferret list: ferret-request@ferret.ocunix.on.ca
-