home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!noc.near.net!hri.com!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!seunet!enea!sommar
- From: sommar@enea.se (Erland Sommarskog)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan1.115424.27258@enea.se>
- Date: 1 Jan 93 11:54:24 GMT
- References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET>
- Organization: Enea Data AB
- Lines: 64
-
- Vadim Antonov (avg@rodan.UU.NET) writes:
- >If a combination of letters is treated as a letter IT IS A LETTER.
- >Then add it to the alphabet and let the keyboard driver (which surely
- >knows the language -- simply because there are different keyboard
- >layouts) to handle the matter.
-
- So if I type a C then a million key presses later changes puts in
- an H after the C how can the keyboard driver handle that? It might
- not even be the same driver who are seeing the two!
-
- >FYI, English has some compound letters too (though they're used only
- >in typesetting) -- ff, fff, fi, ffi, fl, ffl..
-
- Which is the not the same as Spanish CH or LL. Saying that ff is one
- letter is like saying Russian "bI" is two...
-
- >why on the Earth do i need to spare bits for encoding glyphs if
- >i already know the language and 8 (or 16 for oriental languages) bits
- >is quite enough to map the alphabet. Don't you see this gap in
- >the logic nullifying all benefits of 10646?
-
- What the hell has the number of bits to do with anything? Do computers
- exist for the programmers of the users?
-
- >With a trivial trick of including several codes for identical glyphs
- >for letters from different languages you can put all of them in ONE
- >meta-alphabet.
-
- Well that's is already done in 10646 for letters which are the same in
- Latin, Cyrillic and Greek scripts. Hopefully, that will not cause to
- much of a mess.
-
- But what Vadim Antonov was discussing was including identical glyphs
- for languages like Swedish, German etc. I guess people are in for real
- surprises because things don't end up where they expect them because
- they happen to use the wrong type of dotted A. Not talking about the
- confusion they get when they are searching the text. Possibly this
- arrangement is friendly for the the lazy programmer Vadim Antonov,
- but not for the poor user.
-
- >ASCII is for English, period.
- >...
- >The semantic in ASCII is hard-coded -- it is the order of letters
- >and the trivial upper-case to lower-case convertion.
- >Unfortunately the move to abolish the last traces of semantic and
- >make it PURELY graphical format destroyed the usefulness of such
- >encoding for data processing.
-
- In what way is ASCII, which is - as state yourself - for English,
- useful for data processing in German or French? Or even its
- semantics useful for these languages? In the poor variety of
- English you can render with ASCII, sorting can be based simply
- on the letter ordering, because accents, digraphs and diaeresis
- which only occurs occassionaly were left out. But German and
- French cannot be simplified in this ways because umlauts and
- accents appear much more often. For these languages the sorting
- algorithm must be more complex that simple sorting on collation
- order, so what's the use of a hard-coded semantics a la ASCII?
-
- You are seeing the solution, simple bit-order comparisons. But
- unfortunately there are not many problems which have this solution.
- --
- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
- Jag gav en k{ck tjeck en check.
-