home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gatech!emory!wupost!spool.mu.edu!yale.edu!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Data tagging (was: 8-bit representation, plus an X problem)
- Date: 31 Dec 1992 07:15:08 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 39
- Message-ID: <1hu6lsINN773@life.ai.mit.edu>
- References: <ISHIKAWA.92Dec22180817@ds5200.personal-media.co.jp> <24479@alice.att.com> <2563@titccy.cc.titech.ac.jp>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
-
- In article <2563@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
- >
- >I heard that, microsoft's NT will have a locale mechanism so that
- >it can print Japanes Han as Japanese and Chinese Han as Chinese,
- >which is impossible with bare 10646/Unicode.
- >
- >Then, how can we have a file containing both Japanese and Chinese?
- >
-
- Ohta-san is making much too much about this issue. First, there is
- much less difference between a Chinese font's rendition of a given Han
- character and a Japanese font's rendition of the same character than
- he seems to imply here. Indeed, in practice, it is merely a matter of
- a font difference. Furthermore, should a Japanese reader see a 10646
- Han character displayed with a Chinese font, or, should a Chinese reader
- see it with a Japanese font, it will still be legible in general.
-
- If one wants a 10646 encoded text containing a mixture of Chinese and
- Japanese to be displayed using different fonts for the Chinese and Japanese
- parts, then a higher level protocol (rich text) must supply the information
- needed to determine which text is Chinese and which is Japanese. This can
- be accomplished indirectly by using font runs, or directly by using some
- form of language tagging.
-
- This is no different from wanting to display English text with a mixture
- of regular and italic faces. 10646 can't tell you which characters should
- be displayed with an italic font either. The key point is that, in the
- absence of font shift information, one can still read the text. Since this
- is also true in the mixed Chinese/Japanese case, it is an equivalent problem.
- 10646/Unicode properly encodes only the "plain text" information which
- allows for legible display. Applications which desire more sophisticated
- display will have to add other "rich text" information necessary for the
- control of such advanced display. The Unicode plain text model considers
- display with multiple fonts to be a sophisticated display requiring
- appropriate font tagging or other data (e.g., language tags) which allows
- the proper font to be selected.
-
- Glenn Adams
- Cambridge, Massachusetts
-