NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 948 < prev next >

Wrap

Internet Message Format | 1992-12-31 | 2.6 KB

Path: sparky!uunet!gatech!emory!wupost!spool.mu.edu!yale.edu!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Newsgroups: comp.std.internat Subject: Re: Data tagging (was: 8-bit representation, plus an X problem) Date: 31 Dec 1992 07:15:08 GMT Organization: MIT Artificial Intelligence Laboratory Lines: 39 Message-ID: <1hu6lsINN773@life.ai.mit.edu> References: <ISHIKAWA.92Dec22180817@ds5200.personal-media.co.jp> <24479@alice.att.com> <2563@titccy.cc.titech.ac.jp> NNTP-Posting-Host: wheat-chex.ai.mit.edu In article <2563@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: > >I heard that, microsoft's NT will have a locale mechanism so that >it can print Japanes Han as Japanese and Chinese Han as Chinese, >which is impossible with bare 10646/Unicode. > >Then, how can we have a file containing both Japanese and Chinese? > Ohta-san is making much too much about this issue. First, there is much less difference between a Chinese font's rendition of a given Han character and a Japanese font's rendition of the same character than he seems to imply here. Indeed, in practice, it is merely a matter of a font difference. Furthermore, should a Japanese reader see a 10646 Han character displayed with a Chinese font, or, should a Chinese reader see it with a Japanese font, it will still be legible in general. If one wants a 10646 encoded text containing a mixture of Chinese and Japanese to be displayed using different fonts for the Chinese and Japanese parts, then a higher level protocol (rich text) must supply the information needed to determine which text is Chinese and which is Japanese. This can be accomplished indirectly by using font runs, or directly by using some form of language tagging. This is no different from wanting to display English text with a mixture of regular and italic faces. 10646 can't tell you which characters should be displayed with an italic font either. The key point is that, in the absence of font shift information, one can still read the text. Since this is also true in the mixed Chinese/Japanese case, it is an equivalent problem. 10646/Unicode properly encodes only the "plain text" information which allows for legible display. Applications which desire more sophisticated display will have to add other "rich text" information necessary for the control of such advanced display. The Unicode plain text model considers display with multiple fonts to be a sophisticated display requiring appropriate font tagging or other data (e.g., language tags) which allows the proper font to be selected. Glenn Adams Cambridge, Massachusetts