home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!not-for-mail
- From: avg@rodan.UU.NET (Vadim Antonov)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 31 Dec 1992 03:11:17 -0500
- Organization: UUNET Technologies Inc, Falls Church, VA
- Lines: 97
- Message-ID: <1hu9v5INNbp1@rodan.UU.NET>
- References: <1992Dec30.010216.2550@nobeltech.se> <1992Dec30.061759.8690@fcom.cc.utah.edu>
- NNTP-Posting-Host: rodan.uu.net
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- In article <1992Dec30.061759.8690@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
- >The "ugly thing Unicode does with asiatic languages" is exactly what it
- >does with all other languages: There is a single lexical assignment for
- >for each possible glyph.
- >....
- >ADMITTED DRAWBACKS IN UNICODE:
- >
- >The fact that lexical order is not maintained for all existing character
- >sets (NOTE: NO CURRENT OR PROPOSED STANDARD SUPPORTS THIS IDEA!) means that
- >a direct arithmatic translation is not possible for...
-
- It means that:
-
- 1) "mechanistic" conversion between upper and lower case
- is impossible (as well as case-insensitive comparisons)
-
- Example: Latin T -> t
- Cyrillic T -> m
- Greek T -> ?
-
- This property alone renders Unicode useless for any business
- applications.
-
- 2) there is no trivial way to sort anything.
- An elementary sort program will require access to enormous
- tables for all possible languages.
-
- English: A B C D E ... T ...
- Russian: A .. B ... E ... C T ...
-
- 3) there is no reasonable way to do hyphenation.
- Since there is no way to tell language from the text there
- is no way to do any reasonable attempts to hyphenate.
- [OX - which language this word is from]?
-
- Good-bye wordprocessors and formatters?
-
- 4) "the similar gliphs" in Unicode are often SLIGHTLY different
- typographical gliphs -- everybody who ever dealt with international
- publishing knows that fonts are designed as a WHOLE and every
- letter is designed with all others in mind -- i.e. X in Cyrillic
- is NOT the same X as Latin even if the fonts are variations of
- the same style. I'd wish you to see how ugly the Russian
- texts prited on American desktop publishing systems with
- "few characters added" are.
-
- In reality it means that Unicode is not a solution for
- typesetting.
-
- Having unique glyphs works ONLY WITHIN a group of languages
- which are based on variations of a single alphabet with
- non-conflicting alphabetical ordering and sets of
- vowels. You can do that for European languages.
- An attempt to do it for different groups (like Cyrillic and Latin)
- is disastrous at best -- we already tried is and finally came to
- the encodings with two absolutely separate alphabets.
-
- I think that there is no many such groups, though, and it is possible
- to identify several "meta-alpahbets". The meta-alphabets have no
- defined rules for cross-sorting (unlike latters WITHIN one
- meta-alphabet; you CAN sort English and German words together
- and it still will make sense; sorting Russian and English together
- is at best useless). It increases the number of codes but not
- as drastically as codifying languages; there are hundreds of
- languages based on a dozen of meta-alphabets.
-
- >The fact that all character sets do not occur in their local lexical order
- >means that a particular character can not be identified as to language by
- >its ordinal value. This is a small penalty to pay for the vast reduction
- >in storage requirements between a 32-bit and a 16-bit character set that
- >contains all required glyphs.
-
- Not true. First of all nothing forces to use 32-bit representation
- where only 10 bits are necessary.
-
- So, as you see the Unicode is more a problem than a solution.
- The fundamental idea is simply wrong -- it is inadequate for
- anything except for Latin-based languages. No wonder we're
- hearing that Unicode is US-centric.
-
- Unfortunately Unicode looks like a cool solution for people who
- never did any real localization work and i fear that this
- particular mistake will be promoted as standard presenting
- us a new round of headache. It does not remove necessity to
- carry off-text information (like "X-Language: english") and
- it makes it not better than existing ISO 8-bit encodings
- (if i know the language i already know its alphabet --
- all extra bits are simply wasted; and programs handling Unicode
- text have to know the laguage for reasons stated before).
-
- UNICODE IS A *BIG* MISTAKE.
-
- (Don't get me wrong -- i'm for the universal encoding; it's
- just that particular idea of unique glyphs that i strongly
- oppose).
-
- --vadim
-