home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!uwm.edu!ogicse!mintaka.lcs.mit.edu!ai-lab!muesli!glenn
- From: glenn@muesli.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Message-ID: <1i3pf7INNcri@life.ai.mit.edu>
- Date: 2 Jan 93 10:06:31 GMT
- Article-I.D.: life.1i3pf7INNcri
- References: <1992Dec30.010216.2550@nobeltech.se> <1992Dec30.061759.8690@fcom.cc.utah.edu> <1hu9v5INNbp1@rodan.UU.NET>
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 67
- NNTP-Posting-Host: muesli.ai.mit.edu
-
- In article <1hu9v5INNbp1@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- >In article <1992Dec30.061759.8690@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
- >>The "ugly thing Unicode does with asiatic languages" is exactly what it
- >>does with all other languages: There is a single lexical assignment for
- >>for each possible glyph.
- >It means that:
- >
- >1) "mechanistic" conversion between upper and lower case
- > is impossible (as well as case-insensitive comparisons)
- >
- > Example: Latin T -> t
- > Cyrillic T -> m
- > Greek T -> ?
- >
- > This property alone renders Unicode useless for any business
- > applications.
- >
-
-
- After reading this yet again, I now believe that this entire conversation
- may be based on a misunderstanding. Unicode does not unify Latin T,
- Cyrillic T, and Greek T! They are separate characters, as are Latin A,
- Cyrillic A, and Greek A. Nor does Unicode unify LATIN A WITH RING and
- ANGSTROM SYMBOL.
-
- Unicode only unifies according to abstract form within the context
- of a particular script, i.e., Unicode encodes the elements of scripts.
- Furthermore, where there is a clear difference in functional use,
- e.g., MINUS vs. HYPHEN vs. HYPHEN vs. FIGURE DASH, Unicode maintains
- separate encodings, even though the shapes may be depicted by a
- single form (glyph). More examples include EXCLAMATION POINT vs.
- LATIN LETTER EXCLAMATION POINT (used as a letter in African alphabets
- based on Latin script) and LATIN LETTER EPSILON (used with a variety
- of Latin script based alphabets).
-
- I apologize for not recognizing earlier where this argument went
- astray. I assumed that you had at least seen a copy of Unicode,
- thus I didn't expect this particular misunderstanding could arise.
-
- As for Asian writing systems based on the Han script, the historical
- relation these uses is much stronger than that between Greek, Latin,
- and Cyrillic. The differences that have developed are more along
- aesthetic dimensions, although differences in functional value have
- developed; but then again, the Latin script is nowhere near exact in
- its form to function mapping, at least in some important writing systems,
- e.g., English & French. It would be as ridiculous to encode two <c>s
- for /k/ and /s/ in English as it would be to encode two Han characters
- with the same form which have developed specialized or slightly
- different meanings in the writing system in which they were used.
-
- Unlike a glyphic encoding, in which forms may be willy-nilly unified
- regardless of function, Unicode takes both form and function into
- account in the determination of what constitutes a separate character
- code element. In some instances, form is given priority; in others,
- function is given priority; in most cases, both have an input.
-
- [N.B. In addition to form and function, Unicode maintains distinctions
- which existed in character sets whose characters were incorporated into
- Unicode. This insures that one can have round-trip conversion between
- existing data. This "compatibility rule" resulted in the inclusion
- of many which would not have been included otherwise, e.g., FULLWIDTH
- LATIN LETTER A-Z, a-z, etc (needed for compatibility with most Asian
- character sets). Many Han characters which are stroke variants were
- encoded for this reason, and would have been otherwise unified.]
-
- Regards,
- Glenn Adams
-