NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 974 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 2.4 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 1 Jan 1993 16:56:34 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 44 Message-ID: <1i2emiINN2td@rodan.UU.NET> References: <1992Dec31.203101.5447@prl.dec.com> <1i0s05INNnfn@rodan.UU.NET> <1993Jan1.114158.17149@prl.dec.com> NNTP-Posting-Host: rodan.uu.net Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages In article <1993Jan1.114158.17149@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes: >In article <1i0s05INNnfn@rodan.UU.NET>, avg@rodan.UU.NET (Vadim Antonov) writes: >> >> A good encoding should support easy (i'd say natural) localization. >> It should provide simple algorithms for simple functions >> like getting string length, searching a character, case-insensitive >> comparison, lexicographical comparison. >> > >Well that's where you're wrong. The characters and how they are used >are distinct problems. Don't you realize that having trivial programs to ask which language they're doing operation in effectively defeats the entire purpose of Unicode? Should my shell ask me about language of every [a-z] in my commands? If it shouldn't then it has to get the information somewhere, right? If the information is kept outside the text (file names in this case) then why do i need all those extra bits -- my program *already* knows the exact (small) alphabet. "Unicode -- a code for texts which will never be sorted!" Great. >UNICODE is >a good example of this: not only does it specify the code -> glyph >mapping (ie the encoding) it has support for left -> right, right -> left >writing styles and a bunch of other stuff, and this part of UNICODE is a mess. Yuck. Right->left is nothing more than a character with negative width. >Problem 2 (localisation) is damn hard. Tell me. I've spent ten years doing *real* localization and i know the price of ill-thought solutions on the ground level (aka character set ordering). >Should Problem 1 cater for the fact I type `localisation' whereas >you type `localization'? We're both using Engligh, typed on American >keyboards (I guess, oops mine's made in West Germany) so where are you >going to draw the line. Is this Problem 1? I say it's Problem 2. The example is artificial and has nothing to do with the character sets. As you well aware it is different words in the same alphabet. --vadim