home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: An alternative I18N paradigm
- Date: 31 Dec 1992 08:11:21 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 63
- Message-ID: <1hu9v9INN923@life.ai.mit.edu>
- References: <1hkff3EINN5uv@uni-erlangen.de> <1hncs1INN1qq@corax.udac.uu.se> <DAN.92Dec29102634@dan.watson.ibm.com>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
-
- The real problem, at least as I see it, is that the locale model
- doesn't distinguish between the consumer of information and the
- producer of information. It naively assumes that an individual
- end user must choose the manner in which (linguistically and
- culturally sensitive) information is to be presented, and that
- this choice can be determined by one fixed value parameter (i.e.,
- the locale setting).
-
- The first assumption quite poorly models the real world of text;
- for here the author or editor is usually responsible for the
- presentation of the information, including the written form
- of that information. [We are not quite at the state yet when a
- system can automatically translate random text to the written form
- preferred by the consumer of text.] The second assumption fails
- miserably in the real world of text where many languages are intermixed
- sometimes in a single writing system, sometimes in a single document
- employing multiple writing systems.
-
- What should first be done is an analysis of the producer and potential
- consumers of information. If the producer is the OS or a local utility,
- e.g., /bin/date, and the consumer is a single individual who prefers
- reading dates and time in a particular way, and this way can be
- characterized by a single (possibly complex) parameter, then the locale
- model will work. However, if the producer is another individual, then
- the locale probably should be ignored, and information contained in the
- text itself should be consulted about matters such as character set encoding,
- font(s), language tag(s), or other presentation information. In this case,
- the locale may be of help only in the case that such explicit information
- (encoding, font, etc.) is absent, and here, it may produce complete garbage
- if it makes the wrong assumptions.
-
- One may ask where 10646/Unicode fits into all of this? It provides only
- a small part of the solution; namely, a single universal character set
- encoding rather than many non-universal encodings. Of course, one might
- propose to solve a small part of the problem by using 10646: use 10646
- for all encoded text. However, as has been pointed out by Ohta-san and
- others, this doesn't solve many other problems, e.g., whether to use a
- Chinese or a Japanese font to display a given Han character. So more
- is needed still.
-
- I, for one, do not believe that 10646 will become universally used
- in a fortnight. Local and other standard encodings will continue to
- exist, probably forever. So we need to start doing one thing very
- quickly: tagging character data as to its encoding. Another thing we
- need to do, is add language (or writing system) tags to texts which
- mix multiple languages. Alternatively, this could be done by tagging
- font runs and then associating languages with those runs [I do not
- advocate this method - I prefer explicit language or writing system
- tags]. Other kinds of tags might be necessary for certain types
- of processing, e.g., yomi (phonetic reading) tags for allowing the
- display of furugana, sort keys for allowing producer specified sorting
- behavior, and so forth.
-
- 10646 will not even address any of these matters. However, Unicode may
- do so in the form of implementation guidelines or further work on I18N.
- Nonetheless, I think that it is quite important for many parties to begin
- implementing such systems so that development of standard tags and tagging
- systems can proceed. We need prior art and experience in these areas
- before effective standards can be developed with a reasonable hope of
- success.
-
- Glenn Adams
- Cambridge, Massachusetts
-