NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 957 < prev next >

Wrap

Internet Message Format | 1992-12-31 | 3.0 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 31 Dec 1992 18:03:05 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 48 Message-ID: <1hvu79INN4qf@rodan.UU.NET> References: <1992Dec30.061759.8690@fcom.cc.utah.edu> <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> NNTP-Posting-Host: rodan.uu.net Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages In article <8490@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >Wrong Vadim. You cannot even do it in the European languages. You cannot >even do it in German. How would you assign codes... Dik, i never insisted that all European languages belong to the single group -- how many are the ISO Latin-X sets? My point was that there obviously are identifyable meta-alpahbets covering several languages. >A-umlaut sorts as if it is the letter combination AE, and at the same >time the umpteenth letter of Swedish (after Z). How would you encode >spanish where the letter combinations CH and LL are regarded as single >letters? >Or Maltese where the GH-crossbar combination is a single letter >that does not sort in the neighbourhood of G or H-crossbar but between >P and Q? Or dutch, where the letter combination ij is sorted either >amongst i as a double letter, or amongst y as a single letter, or >between y and z as a single letter, depending on who does the sorting? If a combination of letters is treated as a letter IT IS A LETTER. Then add it to the alphabet and let the keyboard driver (which surely knows the language -- simply because there are different keyboard layouts) to handle the matter. FYI, English has some compound letters too (though they're used only in typesetting) -- ff, fff, fi, ffi, fl, ffl.. The idea of visual encoding (and one letter-onr glyph is nothing more than a compressed image of the text) is simply wrong because it drops valuable information readily available at the point of the CREATION of the text but not later. Sure, the information can (*must* if you're going to do trivial things like sorting or case-insensitive comparisons) be preserved off-text (in mail headers or in file attributes, for example) but it effectively defeats the very purpose of ISO10646 -- why on the Earth do i need to spare bits for encoding glyphs if i already know the language and 8 (or 16 for oriental languages) bits is quite enough to map the alphabet. Don't you see this gap in the logic nullifying all benefits of 10646? 10646 was meant as an encoding eliminating the necessity to carry off-text information (which is not a piece of cake, especially in multi-lingual texts). However, the "single glyph" approach ruined the very intent because you need the off-text information to do trivial tasks anyway! What's the gain? More wasted bits, yeah? Take a life, guys. We in Russia did that mistake (DKOI and "GOST" encodings) many years ago and came to realize that this solution is too simple to be correct. --vadim