NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 987 < prev next >

Wrap

Text File | 1993-01-02 | 3.5 KB | 72 lines

Newsgroups: comp.std.internat Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!wirzeniu From: wirzeniu@klaava.Helsinki.FI (Lars Wirzenius) Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1993Jan2.020512.3287@klaava.Helsinki.FI> Keywords: ISO10646 Unicode Organization: University of Helsinki References: <1i13rrINNars@rodan.UU.NET> <id.68CW.A16@ferranti.com> <1i2m57INN4vr@rodan.UU.NET> Date: Sat, 2 Jan 1993 02:05:12 GMT Lines: 60 avg@rodan.UU.NET (Vadim Antonov) writes: >The problem is not in searching -- the problem is in presenting >the information and in regular expressions ([a-z] - does it include "o?) Wrong question. The correct question is: Which "o letters does [a-z] include, if any? Germans have "a after a and "o after o (or so I remember from when I knew some German, about five years ago), Swedes and Finns have them after z. So, should [a-z] include the German letters or not? The answer is that it depends. If I'm using the German alphabet, they should be included, but if I'm using the Swedish one, they should not be included. So, how does the stupid thing know which alphabet we're using? Either we tell it to it explicitly (which Vadim seems to abhor), or we use completely separate codes for each and every letter of each and every language, so that the Swedish a and z are different from the German a and z. The latter solution is ugly (IMHO), since it causes things that are universally considered to be one to become many and that only makes trouble. The latter solution also is not going to happen, I think. >Let's solve simplier problems first. I merely want a character set >which allows me to use my screen editor without fussing around >every search pattern i use. If I understand your proposed solution correctly, it is the "each language has a completely different set of character codes" solution which I have disagreed with above. Now, I can easily understand that it works when the two languages are like English and Russian, i.e. they use completely different alphabets that only happen to have a couple of letters that _look_ the same (but are still quite different characters, from what I understand; I have no objection of the Unicode designer being flogged if they mix Cyrillic and Latin letters :-). It does not work very well when using languages that have almost the same alphabets, with only a few letters differing, such as Swedish and German. Just the input problem is enough to kill this idea, I think: The user would have to get the language of every character he types correct, and as somebody who routinely (as in usually large amounts every day) enters text in several languages (English, Swedish, and Finnish, with a few quoted words every now and then from a number of other languages, including German, Latin, and French) and often enough mixes several languages in one sentence, let me tell you that it is _not_ going to succeed. Having to switch language mode every few words, or for every different document, is not going a workable solution. The user is also going to have a hard time differentiating between identically looking letters in different languages. On fancy graphics terminals one might use fonts or something to aid this, but that does not exactly make for WYSIWYG, or even minimally good looking screens. There is no such option for those of us using text-only terminals. -- Lars.Wirzenius@helsinki.fi (finger wirzeniu@klaava.helsinki.fi) MS-DOS, you can't live with it, you can live without it.