NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 955 < prev next >

Wrap

Text File | 1992-12-31 | 4.2 KB | 78 lines

Newsgroups: comp.std.internat Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!wirzeniu From: wirzeniu@klaava.Helsinki.FI (Lars Wirzenius) Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1992Dec31.171450.1513@klaava.Helsinki.FI> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Organization: University of Helsinki References: <1992Dec30.061759.8690@fcom.cc.utah.edu> <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> Date: Thu, 31 Dec 1992 17:14:50 GMT Lines: 66 dik@cwi.nl (Dik T. Winter) writes: >Wrong Vadim. You cannot even do it in the European languages. You cannot >even do it in German. How would you assign codes such that the German >A-umlaut sorts as if it is the letter combination AE, and at the same >time the umpteenth letter of Swedish (after Z). (Second letter after Z, actually; the letter in between is A-with-ring-on-top -- don't know its official name, if any. The Finnish alphabet has no A-w-r-o-t, so I guess they have A-umlaut after Z, although they/we seem to usually include the A-w-r-o-t too, thereby effectively using the same alphabet (we) Swedes use. I'm a bit uncertain of the official way, and I only have two dictionaries at the moment and those are both for English (one by Oxford, one by Webster).) The immediately obvious solution to the problem with the German A-umlaut not being the same as the Swedish A-umlaut (it looks like a duck, but it doesn't walk like a duck, and it doesn't quack like a duck, is it a duck?) is to assign them different codes so that you can differentiate between languages. Hm..., this could get messy though, since then you get confused people: should a spelling checker for Swedish accept the German a-umlaut in a Swedish word? Should a sorting program sort the Swedish a-umlaut differently from the German? How many people are going to look at the 2^32 entries big table and accidentally pick the wrong code, especially when they are not that happy about foreign languages to begin with? Personally, I think that giving two characters that look the same and that are usually thought to be the same (as the German and Swedish a-umlauts are, or the a-umlauts in other languages), even if they are used differently in different languages. A character encoding standard should provide an efficient, simple way of referring to different characters, not confuse things by trying to encode all other possible information as well. Single characters can be encoded simply, what happens when they are used in groups to form words in different languages is an other issue and should be dealt with separately. IMHO as a layman with only a little experience with I18N. (I've written one program that was able to switch output languages -- Swedish, Finnish, and English -- thought that the issue was ugly and unpleasant, if only because of the character set issues, and decided to use English as the only language unless I have to use another until I can understand the issues better and either hear of or come up with a good, elegant and simple solution. I'm still using English after, what, five years or so.) I don't know whether Japanese and Chinese have "characters" that are the same in the way German a-umlaut and Swedish a-umlaut (or a through z), since I know almost nothing of either language. If the only difference is that they are usually drawn slightly differently (for all I know, the Germans might like to draw a-umlaut differently from Swedes, by using a differntly formed umlaut higher up or whatever; its still considered to be the same character), I see no reason why they shouldn't use the same code. But, since I'm linguistically challenged, I'll have to take somebody elses word for it, though a concise explanation would be nice. Disclaimer: all my knowledge of the characters standards come from netnews, a notorious source of unreliable information mixed with absolute truth. I don't even have a real stake in this game (like I said, English is good enough for me almost all of the time, and Latin-1 suffices even when it isn't), I just feel like rambling. Take it, leave it, or roast me via mail. -- Lars.Wirzenius@helsinki.fi (finger wirzeniu@klaava.helsinki.fi) MS-DOS, you can't live with it, you can live without it.