NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 968 < prev next >

Wrap

Internet Message Format | 1993-01-01 | 3.4 KB

Path: sparky!uunet!noc.near.net!hri.com!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!seunet!enea!sommar From: sommar@enea.se (Erland Sommarskog) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1993Jan1.115424.27258@enea.se> Date: 1 Jan 93 11:54:24 GMT References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> Organization: Enea Data AB Lines: 64 Vadim Antonov (avg@rodan.UU.NET) writes: >If a combination of letters is treated as a letter IT IS A LETTER. >Then add it to the alphabet and let the keyboard driver (which surely >knows the language -- simply because there are different keyboard >layouts) to handle the matter. So if I type a C then a million key presses later changes puts in an H after the C how can the keyboard driver handle that? It might not even be the same driver who are seeing the two! >FYI, English has some compound letters too (though they're used only >in typesetting) -- ff, fff, fi, ffi, fl, ffl.. Which is the not the same as Spanish CH or LL. Saying that ff is one letter is like saying Russian "bI" is two... >why on the Earth do i need to spare bits for encoding glyphs if >i already know the language and 8 (or 16 for oriental languages) bits >is quite enough to map the alphabet. Don't you see this gap in >the logic nullifying all benefits of 10646? What the hell has the number of bits to do with anything? Do computers exist for the programmers of the users? >With a trivial trick of including several codes for identical glyphs >for letters from different languages you can put all of them in ONE >meta-alphabet. Well that's is already done in 10646 for letters which are the same in Latin, Cyrillic and Greek scripts. Hopefully, that will not cause to much of a mess. But what Vadim Antonov was discussing was including identical glyphs for languages like Swedish, German etc. I guess people are in for real surprises because things don't end up where they expect them because they happen to use the wrong type of dotted A. Not talking about the confusion they get when they are searching the text. Possibly this arrangement is friendly for the the lazy programmer Vadim Antonov, but not for the poor user. >ASCII is for English, period. >... >The semantic in ASCII is hard-coded -- it is the order of letters >and the trivial upper-case to lower-case convertion. >Unfortunately the move to abolish the last traces of semantic and >make it PURELY graphical format destroyed the usefulness of such >encoding for data processing. In what way is ASCII, which is - as state yourself - for English, useful for data processing in German or French? Or even its semantics useful for these languages? In the poor variety of English you can render with ASCII, sorting can be based simply on the letter ordering, because accents, digraphs and diaeresis which only occurs occassionaly were left out. But German and French cannot be simplified in this ways because umlauts and accents appear much more often. For these languages the sorting algorithm must be more complex that simple sorting on collation order, so what's the use of a hard-coded semantics a la ASCII? You are seeing the solution, simple bit-order comparisons. But unfortunately there are not many problems which have this solution. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se Jag gav en k{ck tjeck en check.