home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!wirzeniu
- From: wirzeniu@klaava.Helsinki.FI (Lars Wirzenius)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1992Dec31.171450.1513@klaava.Helsinki.FI>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Organization: University of Helsinki
- References: <1992Dec30.061759.8690@fcom.cc.utah.edu> <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl>
- Date: Thu, 31 Dec 1992 17:14:50 GMT
- Lines: 66
-
- dik@cwi.nl (Dik T. Winter) writes:
- >Wrong Vadim. You cannot even do it in the European languages. You cannot
- >even do it in German. How would you assign codes such that the German
- >A-umlaut sorts as if it is the letter combination AE, and at the same
- >time the umpteenth letter of Swedish (after Z).
-
- (Second letter after Z, actually; the letter in between is
- A-with-ring-on-top -- don't know its official name, if any. The
- Finnish alphabet has no A-w-r-o-t, so I guess they have A-umlaut after
- Z, although they/we seem to usually include the A-w-r-o-t too, thereby
- effectively using the same alphabet (we) Swedes use. I'm a bit
- uncertain of the official way, and I only have two dictionaries at the
- moment and those are both for English (one by Oxford, one by
- Webster).)
-
- The immediately obvious solution to the problem with the German
- A-umlaut not being the same as the Swedish A-umlaut (it looks like a
- duck, but it doesn't walk like a duck, and it doesn't quack like a
- duck, is it a duck?) is to assign them different codes so that you can
- differentiate between languages. Hm..., this could get messy though,
- since then you get confused people: should a spelling checker for
- Swedish accept the German a-umlaut in a Swedish word? Should a
- sorting program sort the Swedish a-umlaut differently from the German?
- How many people are going to look at the 2^32 entries big table and
- accidentally pick the wrong code, especially when they are not that
- happy about foreign languages to begin with?
-
- Personally, I think that giving two characters that look the same and
- that are usually thought to be the same (as the German and Swedish
- a-umlauts are, or the a-umlauts in other languages), even if they are
- used differently in different languages. A character encoding
- standard should provide an efficient, simple way of referring to
- different characters, not confuse things by trying to encode all other
- possible information as well. Single characters can be encoded
- simply, what happens when they are used in groups to form words in
- different languages is an other issue and should be dealt with
- separately. IMHO as a layman with only a little experience with I18N.
- (I've written one program that was able to switch output languages --
- Swedish, Finnish, and English -- thought that the issue was ugly and
- unpleasant, if only because of the character set issues, and decided
- to use English as the only language unless I have to use another until
- I can understand the issues better and either hear of or come up with
- a good, elegant and simple solution. I'm still using English after,
- what, five years or so.)
-
- I don't know whether Japanese and Chinese have "characters" that are
- the same in the way German a-umlaut and Swedish a-umlaut (or a through
- z), since I know almost nothing of either language. If the only
- difference is that they are usually drawn slightly differently (for
- all I know, the Germans might like to draw a-umlaut differently from
- Swedes, by using a differntly formed umlaut higher up or whatever; its
- still considered to be the same character), I see no reason why they
- shouldn't use the same code. But, since I'm linguistically
- challenged, I'll have to take somebody elses word for it, though a
- concise explanation would be nice.
-
- Disclaimer: all my knowledge of the characters standards come from
- netnews, a notorious source of unreliable information mixed with
- absolute truth. I don't even have a real stake in this game (like I
- said, English is good enough for me almost all of the time, and
- Latin-1 suffices even when it isn't), I just feel like rambling. Take
- it, leave it, or roast me via mail.
-
- --
- Lars.Wirzenius@helsinki.fi (finger wirzeniu@klaava.helsinki.fi)
- MS-DOS, you can't live with it, you can live without it.
-