NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 975 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 5.2 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 1 Jan 1993 17:32:31 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 113 Message-ID: <1i2gpvINN3lm@rodan.UU.NET> References: <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan1.115424.27258@enea.se> NNTP-Posting-Host: rodan.uu.net In article <1993Jan1.115424.27258@enea.se> sommar@enea.se (Erland Sommarskog) writes: >So if I type a C then a million key presses later changes puts in >an H after the C how can the keyboard driver handle that? It might >not even be the same driver who are seeing the two! Aw, don't be silly. It's trivial. >>FYI, English has some compound letters too (though they're used only >>in typesetting) -- ff, fff, fi, ffi, fl, ffl.. > >Which is the not the same as Spanish CH or LL. Saying that ff is one >letter is like saying Russian "bI" is two... Sure not, they're not "letters" for sorting and case-conversion purposes. >>why on the Earth do i need to spare bits for encoding glyphs if >>i already know the language and 8 (or 16 for oriental languages) bits >>is quite enough to map the alphabet. Don't you see this gap in >>the logic nullifying all benefits of 10646? > >What the hell has the number of bits to do with anything? Do computers >exist for the programmers of the users? Look, you've missed the logic completely. Read it please again. I also explained it several times in other postings. >>With a trivial trick of including several codes for identical glyphs >>for letters from different languages you can put all of them in ONE >>meta-alphabet. > >Well that's is already done in 10646 for letters which are the same in >Latin, Cyrillic and Greek scripts. Hopefully, that will not cause to >much of a mess. It was the only solution -- the problem is the same but it's a worse case. >But what Vadim Antonov was discussing was including identical glyphs >for languages like Swedish, German etc. I guess people are in for real >surprises because things don't end up where they expect them because >they happen to use the wrong type of dotted A. Not talking about the >confusion they get when they are searching the text. Possibly this >arrangement is friendly for the the lazy programmer Vadim Antonov, >but not for the poor user. What do you tell the poor user when he has a database with English and Russian company names (a case from my practice, to be real) -- in both upper and lower case and the smart guys (apparently Erlands pupils) made a terminal which converts cyrillic codes for the letters of the same shape as latin to the latin codes? Go get a rope? As for lazy i bet i wrote ten times more than you. Send you my resume? >>ASCII is for English, period. >In what way is ASCII, which is - as state yourself - for English, >useful for data processing in German or French? Because: 1) the programs working with German ASCII and French ASCII aren't the same programs as those working with English ASCII -- they have language-specific translation tables in comparison routines which effictively reorder ASCII making it a somehow different code. 2) since relatively few programs were designed this way there is a lot of programs with erroneous behaviour, for example: tr ? SS ^-ezsat here 3) it is not ASCII anyway (where's my {?) >Or even its >semantics useful for these languages? The basic ASCII principles (after reordering and replacing several characters) remained the same -- there is a way to convert upper<->lower case and there is a way to sort without asking which language every word came from (it's known apriori). That does not work with Unicode. >In the poor variety of >English you can render with ASCII, sorting can be based simply >on the letter ordering, because accents, digraphs and diaeresis >which only occurs occassionaly were left out. But German and >French cannot be simplified in this ways because umlauts and >accents appear much more often. For these languages the sorting >algorithm must be more complex that simple sorting on collation >order, so what's the use of a hard-coded semantics a la ASCII? There always is a reasonable approximation people use daily -- technically speaking ANY sorting can be made arithmetic by trivial character convertion rules. Since invention of lexicographic sorting those rules came to be pretty simple. >You are seeing the solution, simple bit-order comparisons. But >unfortunately there are not many problems which have this solution. I do not claim to provide a panacea. I simply warn about the known problem which can easily outweigh all benefits of the unified code. The solution may seem weird -- until you bump on those holes yourself. As i already said there is no easy way around -- you have to deal with those issues somewhere and it's better to have it solved on the elementary level -- otherwise EVERY program will be forced to keep track of the language which is not easy and sometimes ruins the whole logic of the program (see shell globbing example in my previous posting os tr example before). I'm pretty sure Unicode is dead-born exactly because it requires non-trivial changes in existing programs for no reason. --vadim