NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / std / internat / 1292 < prev next >

Wrap

Internet Message Format | 1993-01-24 | 3.8 KB

Path: sparky!uunet!gatech!destroyer!gumby!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Newsgroups: comp.std.internat Subject: Re: Script Unification [was: Re: Cleanicode] Date: 23 Jan 1993 17:37:01 GMT Organization: MIT Artificial Intelligence Laboratory Lines: 58 Message-ID: <1jrvntINN3a0@life.ai.mit.edu> References: <2179@blue.cis.pitt.edu> <1jlojhINNqv3@life.ai.mit.edu> <ISHIKAWA.93Jan22203618@ds5200.personal-media.co.jp> NNTP-Posting-Host: wheat-chex.ai.mit.edu In article <ISHIKAWA.93Jan22203618@ds5200.personal-media.co.jp> ishikawa@personal-media.co.jp writes: >>Of course it is true that CJK unification does have certain costs, >>e.g., different implicit sort orders cannot be maintained without >>language tags, minor distinctions in the glyphic representation of >>CJK character data cannot be made without language tags, and so >>forth. However, and this is important to consider, such distinctions >>are not maintained by character set standards practices for other >>scripts either: the English, German, French, and Spanish alphabets, >>all distinct in their ordering rules, all potentially requiring slightly >>different glyphic displays, >But, here is the dumb question. Are 'a', 'b', 'c' in English and, say, >the similar looking characters in French given slightly different >glyphic display under similar circumstances?! My point in this paragraph is that existing character sets like ISO8859-1 (IsoLatin1), or the Windows ANSI set, or the standard Apple set, do not distinguish among the symbols which are shared by different alphabets which are derived from the Latin script. Unifying these alphabets as a single alphabet-independent script makes a lot of sense for many kinds of text processes, e.g., searching, yet makes other processes difficult, e.g., culturally correct sorting. As for display, simple display systems will probably never distinguish among the forms used to display these alphabets; however, high quality typography may very well abide to different standards as to which font to use to display these different alphabets usage of a single script. This is similar to the situation in CJK: different alphabets' use of the Han script (here I am thinking of Traditional Chinese, Simplified Chinese, Japanese, and Korean as four distinct alphabets) requires different fonts for quality display; yet for simple, legible display, one font will suffice. One argument that has been made against Han unification is that these different uses require different display forms. But the differences in form are minor and do not affect the meaning of the text. This is identical to what holds in unifying different alphabets which use the Latin script. Admittedly, there are many more forms in the Han script, and, given the complexity of these forms, there is much more opportunity for variation. However, these variations do not in general cause a change in the meaning (basic content) of the text. The goal of Unicode was to define a "plain text format" which captured only the basic content and no more; any further distinctions, such as font attributes or language attributes, are expected to be subsumed in some rich text form which is layered on top of (or interleaved with) the basic Unicode plain text string. Your basic Unix terminal emulator or text editor can deal with Unicode plain text just like ASCII or JIS plain text (with the appropriate modifications for 16-bit characters). Legibility is insured by the criteria of Unicode plain text. On the other hand, a desk top publishing system or a more advanced word processor will most certainly support font attribution, and, in a multilingual environment, language attribution. If you look at programs like Interleaf and Slate (a multimedia editor from BB&N), they have supported language attributes in their rich text format for a long time now. Glenn Adams