NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 988 < prev next >

Wrap

Text File | 1993-01-02 | 2.4 KB | 47 lines

Sender: Postmaster@iecc.cambridge.ma.us Newsgroups: comp.std.internat Path: sparky!uunet!wupost!usc!elroy.jpl.nasa.gov!decwrl!world!iecc!mailgateway Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) References: <KIRAVUO.93Jan1164705@lesti.hut.fi> Organization: I.E.C.C. Date: 1 Jan 93 17:57:35 EST (Fri) From: johnl@iecc.cambridge.ma.us (John R. Levine) Message-ID: <9301011757.AA04714@iecc.cambridge.ma.us> Lines: 35 >it is my opinion that there is no way to make a simple character code that >will perform sorting and character conversion automatically. Having done my share of i18n, I thoroughly agree. When I was writing the international scaffolding for Javelin, a PC time-series modelling package, we came up with a locale-like thing that let you load a country configuration file. The config file set the collating sequence, including which characters sort together, and whether there are pairs of characters that sort as one like spanish ch and ll, or single characters that sort as two, like the German umlauted vowels. It also loaded the strings that were inserted automatically into graphs and printouts, e.g. month names and words like "Millions." What did not change was the message strings in the program or the table of function and macro names, all of which were version-specific. That is, if you bought a French version of Javelin, it always spoke French, but you could load in a country driver to produce reports in German or Spanish or Dutch. The separation between the "locale" for the program and the "locale" for the reports was quite useful, and the users liked it. The program message strings were linked in when a particular version of Javelin was built, so that the distributor for a particular country could build the version for that country. Specifically referring to sorting, it became quite clear that we could not depend on there being a canonical printable version of a sortable string. In some languages, there are lower case characters without upper case equivalents or vice versa. The canonical form was a list of collating sequence positions so that in English all versions of the letter "A" might turn into 12, all versions of "B" into 13, and so on. The canonical form was easy to sort and useful for determining whether two strings were equivalent, important in the symbol table, but you couldn't turn it back into something printable. Regards, John Levine, johnl@iecc.cambridge.ma.us, {spdcc|ima|world}!iecc!johnl