NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 980 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 3.4 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 1 Jan 1993 18:57:07 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 78 Message-ID: <1i2lojINN4se@rodan.UU.NET> References: <8494@charon.cwi.nl> <1i2durINN2pj@rodan.UU.NET> <8496@charon.cwi.nl> NNTP-Posting-Host: rodan.uu.net Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages On good spellchakers: the good spellchecker will say: "this word contains letter "a from an incorrect alpabet, replace?" Besides, there are such things as indication of letters in the preferred register with color, shape or briteness. The multiple codes are easy to handle with a proper interface. I do it every day and cannot say it ever bothered me. I have a fully-functional Unix supporting Russian as a reward for that nearly non-existent inconvinience. In article <8496@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >I do not think you understand. From the AVON (Amtliches Verzeichnis der >Ortnetzkennzahlen) edition 1985, which gives area codes for the places in >Germany. The next is a selection of places mentioned ("o is o-umlaut): > Modautal > M"ockm"uhl > ... > M"ornsheim > Moers > M"ossingen > ... > M"otzingen > Mogendorf >now come up with a coding that allows this (standard German) sorting. >Note that it is not allowed to spell Moers as M"ors, that is a severe >spelling error! On the other hand in the absense of umlauts it is >allowable to spell "o as oe (because it is in fact a shorthand for it), >but to do so when there are umlauts available does not look very good. Then it's necessary to add exceptions to the sorting algorithm. In any case if you know that THIS PARTICULAR "o is German one (not Finnish) you still can use ONE sorting algorithm without asking which language it is. I do not think that there are many exceptions and the universal string comparison algorithm (which can be included in the standard!) will be pretty simple. BTW, how do they deal with things like ["o-z] in regualar expressions? Another solution is to create a generic rule for EQUIVALENT letters which have identical position in the sorting order and to add a "letter" oe. So, even if sorting is not regular there always is a way around -- with Unicode you can't do even that. The argument about redundance of Unicode encoding given the external constraints (aka explicitly specified language) still holds. So far, it is the most serious challenge and i do not think anyone can beat it. (I repeat it: to do trivial operations like case-insensitive comparisons, sorting, regular expression matching Unicode requires explicit specification of the language -- it can be obtained from user or recorded somewhere outside the text itself. The "paradox" is that if we have this information we DO NOT NEED extended Unicode codes because we already know the alpahbet and it is small!) Practically it means that Unicode is useless (i.e. simply wastes bits duplicating information stored somewhere outside the text). > > Forget about "traditions" -- users do not care which code is inside if > > it looks like their usual stuff. > >So users are completely uninterested in the coding and the sorting algorithm >used! Users are interested if they're able to do the work without grep asking them which language they mean everytime they run it. Beat it! --vadim