NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 921 < prev next >

Wrap

Internet Message Format | 1992-12-22 | 3.4 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!ames!sun-barr!sh.wide!wnoc-tyo-news!sranha!anprda!pmcgw!personal-media.co.jp From: ishikawa@personal-media.co.jp (Chiaki Ishikawa) Newsgroups: comp.std.internat Subject: Re: Data tagging (was: 8-bit representation, plus an X problem) Message-ID: <ISHIKAWA.92Dec22180817@ds5200.personal-media.co.jp> Date: 22 Dec 92 09:08:03 GMT References: <24426@alice.att.com| <1gpruaINNhfm@frigate.doc.ic.ac.uk> <1gtrpdINN6c4@corax.udac.uu.se> <24455@alice.att.com> Sender: news@pmcgw.personal-media.co.jp Reply-To: ishikawa@personal-media.co.jp Organization: Personal Media Corp., Tokyo Japan Lines: 53 Nntp-Posting-Host: ds5200 In-reply-to: andrew@alice.att.com's message of 20 Dec 92 06:37:41 GMT X-Md4-Signature: d014e4083e841a53c4f8ad47ee0edd19 Hello. I am a Japanese working at a Japanese software company in Tokyo, Japan. In article <24455@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: [long text deleted] for mostly these reasons, Plan 9 chose a byte-stream encoding (initially UTF-1 and then UTF-2) and applied it uniformly according to a single rule: all byte streams interpreted as characters shall be interpreted as a sequence of 10646 characters encoded as UTF-2. this applies everywhere: it applies to the kernel and file server, it applies to the window system and the user's display, it applies to names in archives and tar files. and best of all, the existing system and its text is, because we were an ascii site, already correctly encoded. (actually, we were a Latin-1 system, but we were willing to make user's convert latin-1 text to the new format.) normally, such a solution requires everything entering/leaving the plan 9 universe be converted. however as the encoding we use is backward compatible with ASCII, no conversion needs be done for the only important case (text files on networked filesystems). it also has the advantage that all programs can display text uniformly; users don't have to write S-JIS editors because the regular editor (sam or ed) edits kana/kanji just fine. all the conversion effort can be, and is, confined to one place (a program called tcs [translate character sets]). the hope is that is most cases, this conversion can happen automatically (which is how this stream arose originally; the case of mail and news should be easy to make happen). The work done for plan 9 seems to be very well done in terms of I18N character support. I think I read an article about Plan 9 itself in a Usenix publication, but is there a technical paper specicifically written about I18N aspect of plan 9 available? (BTW, is plan 9 named after "Plan 9 from outer space"? Now there is a computer game based on this movie, I have found out.) i believe these system (design and migration) issues have been essentially ignored in all the work and fuss on unicode/10646. i know that deep within unicode and in places like X/Open, there are efforts to develop support libraries for wide characters but this simply ignores the system issues. andrew hume I agree. Characters alone don't make a system I18N. With all the hoopla in POSIX standardization, meaning of locale still leaves so many loose ends. However, I can say we are defining the problems clearly now. The solutions are not in sight, though. ishikawa@personal-media.co.jp