home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gatech!destroyer!gumby!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Script Unification [was: Re: Cleanicode]
- Date: 23 Jan 1993 17:37:01 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 58
- Message-ID: <1jrvntINN3a0@life.ai.mit.edu>
- References: <2179@blue.cis.pitt.edu> <1jlojhINNqv3@life.ai.mit.edu> <ISHIKAWA.93Jan22203618@ds5200.personal-media.co.jp>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
-
- In article <ISHIKAWA.93Jan22203618@ds5200.personal-media.co.jp> ishikawa@personal-media.co.jp writes:
-
- >>Of course it is true that CJK unification does have certain costs,
- >>e.g., different implicit sort orders cannot be maintained without
- >>language tags, minor distinctions in the glyphic representation of
- >>CJK character data cannot be made without language tags, and so
- >>forth. However, and this is important to consider, such distinctions
- >>are not maintained by character set standards practices for other
- >>scripts either: the English, German, French, and Spanish alphabets,
- >>all distinct in their ordering rules, all potentially requiring slightly
- >>different glyphic displays,
-
- >But, here is the dumb question. Are 'a', 'b', 'c' in English and, say,
- >the similar looking characters in French given slightly different
- >glyphic display under similar circumstances?!
-
- My point in this paragraph is that existing character sets like ISO8859-1
- (IsoLatin1), or the Windows ANSI set, or the standard Apple set, do
- not distinguish among the symbols which are shared by different alphabets
- which are derived from the Latin script. Unifying these alphabets as
- a single alphabet-independent script makes a lot of sense for many kinds
- of text processes, e.g., searching, yet makes other processes difficult,
- e.g., culturally correct sorting. As for display, simple display systems
- will probably never distinguish among the forms used to display these
- alphabets; however, high quality typography may very well abide to
- different standards as to which font to use to display these different
- alphabets usage of a single script. This is similar to the situation
- in CJK: different alphabets' use of the Han script (here I am thinking
- of Traditional Chinese, Simplified Chinese, Japanese, and Korean as
- four distinct alphabets) requires different fonts for quality display;
- yet for simple, legible display, one font will suffice.
-
- One argument that has been made against Han unification is that these
- different uses require different display forms. But the differences
- in form are minor and do not affect the meaning of the text. This
- is identical to what holds in unifying different alphabets which use
- the Latin script. Admittedly, there are many more forms in the Han
- script, and, given the complexity of these forms, there is much more
- opportunity for variation. However, these variations do not in general
- cause a change in the meaning (basic content) of the text. The goal
- of Unicode was to define a "plain text format" which captured only
- the basic content and no more; any further distinctions, such as font
- attributes or language attributes, are expected to be subsumed in
- some rich text form which is layered on top of (or interleaved with)
- the basic Unicode plain text string.
-
- Your basic Unix terminal emulator or text editor can deal with
- Unicode plain text just like ASCII or JIS plain text (with the
- appropriate modifications for 16-bit characters). Legibility is
- insured by the criteria of Unicode plain text. On the other hand,
- a desk top publishing system or a more advanced word processor will
- most certainly support font attribution, and, in a multilingual
- environment, language attribution. If you look at programs like
- Interleaf and Slate (a multimedia editor from BB&N), they have
- supported language attributes in their rich text format for a
- long time now.
-
- Glenn Adams
-