home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!ames!sun-barr!sh.wide!wnoc-tyo-news!sranha!anprda!pmcgw!personal-media.co.jp
- From: ishikawa@personal-media.co.jp (Chiaki Ishikawa)
- Newsgroups: comp.std.internat
- Subject: Re: Data tagging (was: 8-bit representation, plus an X problem)
- Message-ID: <ISHIKAWA.92Dec22180817@ds5200.personal-media.co.jp>
- Date: 22 Dec 92 09:08:03 GMT
- References: <24426@alice.att.com| <1gpruaINNhfm@frigate.doc.ic.ac.uk>
- <1gtrpdINN6c4@corax.udac.uu.se> <24455@alice.att.com>
- Sender: news@pmcgw.personal-media.co.jp
- Reply-To: ishikawa@personal-media.co.jp
- Organization: Personal Media Corp., Tokyo Japan
- Lines: 53
- Nntp-Posting-Host: ds5200
- In-reply-to: andrew@alice.att.com's message of 20 Dec 92 06:37:41 GMT
- X-Md4-Signature: d014e4083e841a53c4f8ad47ee0edd19
-
-
- Hello. I am a Japanese working at a Japanese software company in
- Tokyo, Japan.
-
- In article <24455@alice.att.com> andrew@alice.att.com (Andrew Hume) writes:
-
- [long text deleted]
-
- for mostly these reasons, Plan 9 chose a byte-stream encoding
- (initially UTF-1 and then UTF-2) and applied it uniformly according
- to a single rule: all byte streams interpreted as characters shall
- be interpreted as a sequence of 10646 characters encoded as UTF-2.
- this applies everywhere: it applies to the kernel and file server,
- it applies to the window system and the user's display, it applies
- to names in archives and tar files. and best of all, the existing
- system and its text is, because we were an ascii site, already
- correctly encoded. (actually, we were a Latin-1 system, but we were
- willing to make user's convert latin-1 text to the new format.)
-
- normally, such a solution
- requires everything entering/leaving the plan 9 universe be converted.
- however as the encoding we use is backward compatible with ASCII,
- no conversion needs be done for the only important case (text files on
- networked filesystems). it also has the advantage that all programs
- can display text uniformly; users don't have to write S-JIS editors
- because the regular editor (sam or ed) edits kana/kanji just fine.
- all the conversion effort can be, and is, confined to one place
- (a program called tcs [translate character sets]). the hope is
- that is most cases, this conversion can happen automatically
- (which is how this stream arose originally; the case of mail
- and news should be easy to make happen).
-
- The work done for plan 9 seems to be very well done in terms of I18N
- character support. I think I read an article about Plan 9 itself in a
- Usenix publication, but is there a technical paper specicifically
- written about I18N aspect of plan 9 available? (BTW, is plan 9 named
- after "Plan 9 from outer space"? Now there is a computer game based on
- this movie, I have found out.)
-
- i believe these system (design and migration) issues have been
- essentially ignored in all the work and fuss on unicode/10646.
- i know that deep within unicode and in places like X/Open, there are
- efforts to develop support libraries for wide characters but this simply
- ignores the system issues.
-
- andrew hume
-
- I agree. Characters alone don't make a system I18N. With all the
- hoopla in POSIX standardization, meaning of locale still leaves so
- many loose ends. However, I can say we are defining the problems
- clearly now. The solutions are not in sight, though.
-
- ishikawa@personal-media.co.jp
-