NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 998 < prev next >

Wrap

Internet Message Format | 1993-01-03 | 4.2 KB

Path: sparky!uunet!spool.mu.edu!uwm.edu!linac!att!att!allegra!alice!andrew From: andrew@alice.att.com (Andrew Hume) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <24540@alice.att.com> Date: 3 Jan 93 06:26:22 GMT Article-I.D.: alice.24540 References: <1993Jan1.115424.27258@enea.se> <1i2gpvINN3lm@rodan.UU.NET> <8504@charon.cwi.nl> Organization: AT&T Bell Laboratories, Murray Hill NJ Lines: 69 In article <8504@charon.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes: ~ In article <1993Jan2.230101.20871@enea.se> sommar@enea.se (Erland Sommarskog) writes: ~ > The problem with your idea is that you believe that everything is ~ > known at input time. It isn't. If you have a list of names which ~ > is to be used in Sweden, Norway, Denmark and Finland, the list will ~ > sort differently depending on the reader, not on who is entering the ~ > text. The Swedish and Finnish alphabets ends with A-ring, A-dots, ~ > O-dots. The Danish and Norwegian ends with AE-ligature, O-slash, ~ > A-ring. Looks trivial for a simple bit-order sort? Nope. Because ~ > the dotted A is equivlent to the AE ligature and so is dotted O and ~ > O-slash. Thus Danish and Norwegian names with slashed O should ~ > appear together with Swedish and Finnish names with dotted O. So ~ > the sort algorithm must make no distinction between the two, except ~ > when everything else in the same. And the sort algorithm must know ~ > in which order the user wants the text to be presented. ~ Not to mention that the Danish/Norwegian 'aa' is sometimes equivalent ~ to 'a-ring'. ~ > ~ > This is a simple end-user requirement which your proposal is not ~ > incapabale to handle. ~ I think you intended to say that Vadims proposal is not capable to ~ handle it (I agree with that). vadim is capable of defending himself, but this last statement is simply wrong. in vadim's scheme, you simply have extra information over a normal unicode stream. in the worse case, or at least a complicated one, you fall back and do a complicated compare function (like unicode has to do all the time). what vadim's point is, i think, that for rudimentary sorting, using nonoverlapping encodings based on languages gives you a good answer for (almost) free. he is not saying this rudimentary sort meets every need (although he did argue for dropping fancy sorting together of names etc). ~ > >which is not easy and sometimes ruins the whole logic of the ~ > >program (see shell globbing example in my previous posting os ~ > >tr example before). ~ > ~ > You've talked a lot about regular expressions etc. Frankly I ~ > don't give a damn about those. The main bulk of computer users ~ > are not programmers and don't know what a regular expression ~ > is, so why focus such specific issues? ~ ~ I agree completely here. Shell globbing, regular expressions and ~ sorting by 'ls' are not relevant. (There are still systems around ~ that do not sort the list of files at all. Can you say IBM VM/CS?) ~ Those things are indeed used by programmers only, end-users search ~ on fixed strings and would be surprised if their dotted-a shows no ~ match because of language differences. i think this is wrong, too. almost all systems offer some facilities and applications that present sorted data, mainly filenames. many applications also offer some form of pattern matching, even if it is just some simple wildcard matching. it is important that the sorting rules be comprehensible and (echoing vadim here) SIMPLE. this is relevant. whether or not vadim's scheme adequately addresses this is a factual and technical question, and implies a technical (non-hysterical nor name-calling) reply. just to show my impartiality, i think a weakness of vadim's scheme is the difficulty of distinguishing the different characters to the user. for vadim's example of cyrillic and english, that is not a problem as teh cyrillic and latin fonts are easily distinguishable. but what about text which has french/german/italian/english mixed together? or as erland pointed out, text with a bunch of scandanavian languages mixed together? being able to have several almost identical but distinguishable fonts on the screen at once is a fairly difficult task (although not very difficult). andrew hume