NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 963 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 1.6 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 1 Jan 1993 02:31:17 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 29 Message-ID: <1i0s05INNnfn@rodan.UU.NET> References: <8490@charon.cwi.nl> <1992Dec31.171450.1513@klaava.Helsinki.FI> <1992Dec31.203101.5447@prl.dec.com> NNTP-Posting-Host: rodan.uu.net Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages In article <1992Dec31.203101.5447@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes: >There are two problems: > 1. Getting an encoding of the characters. > 2. Getting local conventions right. >Problem 2 is hard. Problem 1 should not address problem 2. Oops. Nice try. Come again. The ONLY reason people invent charcter encoding standards is to "get local conventions right". If you've got your own machine which does not communicate with others you can choose your own arbitrary encoding. A good encoding should support easy (i'd say natural) localization. It should provide simple algorithms for simple functions like getting string length, searching a character, case-insensitive comparison, lexicographical comparison. Unicode (and for that matter Plan 9 UTF) does not support the last two mentioned functions. I have yet to see Plan 9 _sort_ which will sort Russian strings without being told explicitly that it is Russian. >Plan 9 utf solves Problem 1. UTF does not solve the problem 1 -- it is merely a way to encode 16-bit unsigned integers in the way which (supposedly) will not aggravate the ASCII world. --vadim