home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!wirzeniu
- From: wirzeniu@klaava.Helsinki.FI (Lars Wirzenius)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan2.020512.3287@klaava.Helsinki.FI>
- Keywords: ISO10646 Unicode
- Organization: University of Helsinki
- References: <1i13rrINNars@rodan.UU.NET> <id.68CW.A16@ferranti.com> <1i2m57INN4vr@rodan.UU.NET>
- Date: Sat, 2 Jan 1993 02:05:12 GMT
- Lines: 60
-
- avg@rodan.UU.NET (Vadim Antonov) writes:
- >The problem is not in searching -- the problem is in presenting
- >the information and in regular expressions ([a-z] - does it include "o?)
-
- Wrong question. The correct question is: Which "o letters does [a-z]
- include, if any?
-
- Germans have "a after a and "o after o (or so I remember from when I
- knew some German, about five years ago), Swedes and Finns have them
- after z. So, should [a-z] include the German letters or not?
-
- The answer is that it depends. If I'm using the German alphabet, they
- should be included, but if I'm using the Swedish one, they should not
- be included.
-
- So, how does the stupid thing know which alphabet we're using? Either
- we tell it to it explicitly (which Vadim seems to abhor), or we use
- completely separate codes for each and every letter of each and every
- language, so that the Swedish a and z are different from the German a
- and z. The latter solution is ugly (IMHO), since it causes things
- that are universally considered to be one to become many and that only
- makes trouble. The latter solution also is not going to happen, I
- think.
-
- >Let's solve simplier problems first. I merely want a character set
- >which allows me to use my screen editor without fussing around
- >every search pattern i use.
-
- If I understand your proposed solution correctly, it is the "each
- language has a completely different set of character codes" solution
- which I have disagreed with above.
-
- Now, I can easily understand that it works when the two languages are
- like English and Russian, i.e. they use completely different alphabets
- that only happen to have a couple of letters that _look_ the same (but
- are still quite different characters, from what I understand; I have
- no objection of the Unicode designer being flogged if they mix
- Cyrillic and Latin letters :-). It does not work very well when using
- languages that have almost the same alphabets, with only a few letters
- differing, such as Swedish and German.
-
- Just the input problem is enough to kill this idea, I think: The user
- would have to get the language of every character he types correct,
- and as somebody who routinely (as in usually large amounts every day)
- enters text in several languages (English, Swedish, and Finnish, with
- a few quoted words every now and then from a number of other
- languages, including German, Latin, and French) and often enough mixes
- several languages in one sentence, let me tell you that it is _not_
- going to succeed. Having to switch language mode every few words, or
- for every different document, is not going a workable solution.
-
- The user is also going to have a hard time differentiating between
- identically looking letters in different languages. On fancy graphics
- terminals one might use fonts or something to aid this, but that does
- not exactly make for WYSIWYG, or even minimally good looking screens.
- There is no such option for those of us using text-only terminals.
-
- --
- Lars.Wirzenius@helsinki.fi (finger wirzeniu@klaava.helsinki.fi)
- MS-DOS, you can't live with it, you can live without it.
-