home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!not-for-mail
- From: avg@rodan.UU.NET (Vadim Antonov)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 1 Jan 1993 18:57:07 -0500
- Organization: UUNET Technologies Inc, Falls Church, VA
- Lines: 78
- Message-ID: <1i2lojINN4se@rodan.UU.NET>
- References: <8494@charon.cwi.nl> <1i2durINN2pj@rodan.UU.NET> <8496@charon.cwi.nl>
- NNTP-Posting-Host: rodan.uu.net
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- On good spellchakers:
-
- the good spellchecker will say:
-
- "this word contains letter "a from an incorrect alpabet, replace?"
-
- Besides, there are such things as indication of letters in the
- preferred register with color, shape or briteness.
-
- The multiple codes are easy to handle with a proper interface.
- I do it every day and cannot say it ever bothered me.
- I have a fully-functional Unix supporting Russian as a reward
- for that nearly non-existent inconvinience.
-
- In article <8496@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
- >I do not think you understand. From the AVON (Amtliches Verzeichnis der
- >Ortnetzkennzahlen) edition 1985, which gives area codes for the places in
- >Germany. The next is a selection of places mentioned ("o is o-umlaut):
- > Modautal
- > M"ockm"uhl
- > ...
- > M"ornsheim
- > Moers
- > M"ossingen
- > ...
- > M"otzingen
- > Mogendorf
- >now come up with a coding that allows this (standard German) sorting.
- >Note that it is not allowed to spell Moers as M"ors, that is a severe
- >spelling error! On the other hand in the absense of umlauts it is
- >allowable to spell "o as oe (because it is in fact a shorthand for it),
- >but to do so when there are umlauts available does not look very good.
-
- Then it's necessary to add exceptions to the sorting algorithm.
- In any case if you know that THIS PARTICULAR "o is German one
- (not Finnish) you still can use ONE sorting algorithm without
- asking which language it is.
-
- I do not think that there are many exceptions and the universal
- string comparison algorithm (which can be included in the standard!)
- will be pretty simple.
-
- BTW, how do they deal with things like ["o-z] in regualar expressions?
-
- Another solution is to create a generic rule for EQUIVALENT letters
- which have identical position in the sorting order and to add a
- "letter" oe.
-
- So, even if sorting is not regular there always is a way around --
- with Unicode you can't do even that.
-
- The argument about redundance of Unicode encoding given the external
- constraints (aka explicitly specified language) still holds.
- So far, it is the most serious challenge and i do not think anyone can
- beat it.
-
- (I repeat it: to do trivial operations like case-insensitive comparisons,
- sorting, regular expression matching Unicode requires explicit
- specification of the language -- it can be obtained from user or
- recorded somewhere outside the text itself. The "paradox" is that
- if we have this information we DO NOT NEED extended Unicode codes
- because we already know the alpahbet and it is small!)
-
- Practically it means that Unicode is useless (i.e. simply wastes
- bits duplicating information stored somewhere outside the text).
-
- > > Forget about "traditions" -- users do not care which code is inside if
- > > it looks like their usual stuff.
- >
- >So users are completely uninterested in the coding and the sorting algorithm
- >used!
-
- Users are interested if they're able to do the work without grep
- asking them which language they mean everytime they run it.
-
- Beat it!
-
- --vadim
-