home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!not-for-mail
- From: avg@rodan.UU.NET (Vadim Antonov)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 31 Dec 1992 18:03:05 -0500
- Organization: UUNET Technologies Inc, Falls Church, VA
- Lines: 48
- Message-ID: <1hvu79INN4qf@rodan.UU.NET>
- References: <1992Dec30.061759.8690@fcom.cc.utah.edu> <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl>
- NNTP-Posting-Host: rodan.uu.net
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- In article <8490@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
- >Wrong Vadim. You cannot even do it in the European languages. You cannot
- >even do it in German. How would you assign codes...
-
- Dik, i never insisted that all European languages belong to
- the single group -- how many are the ISO Latin-X sets?
- My point was that there obviously are identifyable meta-alpahbets
- covering several languages.
-
- >A-umlaut sorts as if it is the letter combination AE, and at the same
- >time the umpteenth letter of Swedish (after Z). How would you encode
- >spanish where the letter combinations CH and LL are regarded as single
- >letters?
- >Or Maltese where the GH-crossbar combination is a single letter
- >that does not sort in the neighbourhood of G or H-crossbar but between
- >P and Q? Or dutch, where the letter combination ij is sorted either
- >amongst i as a double letter, or amongst y as a single letter, or
- >between y and z as a single letter, depending on who does the sorting?
-
- If a combination of letters is treated as a letter IT IS A LETTER.
- Then add it to the alphabet and let the keyboard driver (which surely
- knows the language -- simply because there are different keyboard
- layouts) to handle the matter. FYI, English has some compound letters
- too (though they're used only in typesetting) -- ff, fff, fi, ffi, fl, ffl..
-
- The idea of visual encoding (and one letter-onr glyph is nothing more
- than a compressed image of the text) is simply wrong because it
- drops valuable information readily available at the point of the CREATION
- of the text but not later. Sure, the information can (*must* if you're
- going to do trivial things like sorting or case-insensitive comparisons)
- be preserved off-text (in mail headers or in file attributes, for
- example) but it effectively defeats the very purpose of ISO10646 --
- why on the Earth do i need to spare bits for encoding glyphs if
- i already know the language and 8 (or 16 for oriental languages) bits
- is quite enough to map the alphabet. Don't you see this gap in
- the logic nullifying all benefits of 10646?
-
- 10646 was meant as an encoding eliminating the necessity to carry off-text
- information (which is not a piece of cake, especially in multi-lingual
- texts). However, the "single glyph" approach ruined the very intent
- because you need the off-text information to do trivial tasks anyway!
- What's the gain? More wasted bits, yeah?
-
- Take a life, guys. We in Russia did that mistake (DKOI and "GOST" encodings)
- many years ago and came to realize that this solution is too simple to
- be correct.
-
- --vadim
-