home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 1 Jan 1993 06:33:06 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 91
- Message-ID: <1i0oj2INNp4v@life.ai.mit.edu>
- References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
- Keywords: ISO10646 Unicode
-
- In article <1hvu79INN4qf@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- >If a combination of letters is treated as a letter IT IS A LETTER.
-
- So, are <qu> and <ch> letters in English? Is <a-e> a discontiguous
- letter in English 'take'? You are grossly oversimplifying the process
- of determining what the graphemes are in a writing system, the way that
- its users perceive it, and the best way to encode it as information.
-
- >The idea of visual encoding (and one letter-onr glyph is nothing more
- >than a compressed image of the text) is simply wrong...
-
- Where did you get the idea that 10646 is a "visual encoding"? Sure
- 10646 contains some glyphic like encodings. But they are there not
- only to satisfy compatibility goals. Nobody is recommending their
- use.
-
- >10646 was meant as an encoding eliminating the necessity to carry off-text
- >information (which is not a piece of cake, especially in multi-lingual
- >texts).
-
- This is just complete nonsense. I don't know who you've been talking
- with, but whoever it was, they certainly don't know much about 10646
- or Unicode. Unicode (and the 10646 BMP by extension) is oriented around
- encoding the minimum content that allows for minimally legible display.
- This is alredy a well known and understood model for text: ASCII, EBCDIC,
- etc. I don't know a single ASCII-only encoding that tells me what
- language it is, what correct sort order to use, what font to display it
- with, etc. Why should Unicode change this?
-
- Nobody (who understands Unicode) ever claimed that it could solve all
- text processing problems without extra information. Indeed, Unicode
- explicitly does not specify this additional information for good
- engineering reasons.
-
- Consider for a moment why this was a good idea: (1) Unicode fits the
- ASCII model extremely well; (2) Unicode explicitly supports higher
- level protocols just like ASCII, e.g., escape sequences; (3) the
- designers of Unicode recognized that an essentially unbounded amount
- of additional information may be useful for various text processes,
- various system platforms, etc.; (4) obtaining a consensus in ISO on
- a universal set of characters is an enormous problem -- expanding the
- goals to solve all the problems of multi-lingual text processing
- would have doomed the effort from the beginning.
-
- >Take a life, guys. We in Russia did that mistake (DKOI and "GOST" encodings)
- >many years ago and came to realize that this solution is too simple to
- >be correct.
-
- Good engineers know that building solutions to complex problems require
- dividing it into simpler sub-problem. Good engineers also recognize
- that many complex problems do not have a single optimal solution, but many
- sub-optimal solutions. The designers of Unicode and 10646 recognized
- from considerable experience in the field that the biggest problem was
- the proliferation of incomplete, inadequate character sets. Creating
- a single character set that could correct this rapidly problematic
- state of affairs was the single and most important goal of Unicode's
- design. Doing this task efficiently and with some measure of compatibility,
- both for existing data and existing software, were also important
- goals for its design.
-
- What Unicode was not designed to do is extremely important for you and
- others to know. It was not designed to solve the "multi-lingual text
- processing" problem. Indeed, I would challenge you to create a single
- character set, which, in and of itself, solves this problem, and which
- could pass through existing standard bodies to become an international
- standard. Your radical optimism about the possibility of doing so
- leads me to believe that you really have little experience in this field.
- [Not to say that others, including myself, didn't start out with a similar
- ungrounded optimism.]
-
- If you want to truly bring forward the state of the art in multi-lingual
- text processing, you would be much better off to consider how to begin
- using Unicode (10646) with all of its intentional, designed-in limitations,
- rather than incorrectly attributing to 10646 a goal of panacea, then using
- the reality of its limitations to shoot down your misattribution. If you
- take the time to look at the facts, you will find that (1) Unicode was
- designed by a truly global community, and not a USCentric one as has been
- wrongly claimed; (2) Unicode and 10646 continues to solicit ideas
- and aid from persons who have useful contributions to make; and, (3)
- Unicode (10646) provides an adequate foundation on which complete
- solutions to the problems of multi-lingual text processing can be
- constructed.
-
- If you have a genuine interest in learning about the facts surrounding
- Unicode and 10646, I would recommend a good reading of the Unicode Standard
- and the Proceedings of the Unicode Implementor's Workshops.
-
- Glenn Adams
- Cambridge, Massachusetts
-
-
-