home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!uwm.edu!linac!att!att!allegra!alice!andrew
- From: andrew@alice.att.com (Andrew Hume)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <24540@alice.att.com>
- Date: 3 Jan 93 06:26:22 GMT
- Article-I.D.: alice.24540
- References: <1993Jan1.115424.27258@enea.se> <1i2gpvINN3lm@rodan.UU.NET> <8504@charon.cwi.nl>
- Organization: AT&T Bell Laboratories, Murray Hill NJ
- Lines: 69
-
- In article <8504@charon.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes:
- ~ In article <1993Jan2.230101.20871@enea.se> sommar@enea.se (Erland Sommarskog) writes:
- ~ > The problem with your idea is that you believe that everything is
- ~ > known at input time. It isn't. If you have a list of names which
- ~ > is to be used in Sweden, Norway, Denmark and Finland, the list will
- ~ > sort differently depending on the reader, not on who is entering the
- ~ > text. The Swedish and Finnish alphabets ends with A-ring, A-dots,
- ~ > O-dots. The Danish and Norwegian ends with AE-ligature, O-slash,
- ~ > A-ring. Looks trivial for a simple bit-order sort? Nope. Because
- ~ > the dotted A is equivlent to the AE ligature and so is dotted O and
- ~ > O-slash. Thus Danish and Norwegian names with slashed O should
- ~ > appear together with Swedish and Finnish names with dotted O. So
- ~ > the sort algorithm must make no distinction between the two, except
- ~ > when everything else in the same. And the sort algorithm must know
- ~ > in which order the user wants the text to be presented.
- ~ Not to mention that the Danish/Norwegian 'aa' is sometimes equivalent
- ~ to 'a-ring'.
- ~ >
- ~ > This is a simple end-user requirement which your proposal is not
- ~ > incapabale to handle.
- ~ I think you intended to say that Vadims proposal is not capable to
- ~ handle it (I agree with that).
-
- vadim is capable of defending himself, but this last statement
- is simply wrong. in vadim's scheme, you simply have extra information
- over a normal unicode stream. in the worse case, or at least
- a complicated one, you fall back and do a complicated compare function
- (like unicode has to do all the time). what vadim's point is, i think,
- that for rudimentary sorting, using nonoverlapping encodings based on
- languages gives you a good answer for (almost) free. he is not saying
- this rudimentary sort meets every need (although he did argue for
- dropping fancy sorting together of names etc).
-
- ~ > >which is not easy and sometimes ruins the whole logic of the
- ~ > >program (see shell globbing example in my previous posting os
- ~ > >tr example before).
- ~ >
- ~ > You've talked a lot about regular expressions etc. Frankly I
- ~ > don't give a damn about those. The main bulk of computer users
- ~ > are not programmers and don't know what a regular expression
- ~ > is, so why focus such specific issues?
- ~
- ~ I agree completely here. Shell globbing, regular expressions and
- ~ sorting by 'ls' are not relevant. (There are still systems around
- ~ that do not sort the list of files at all. Can you say IBM VM/CS?)
- ~ Those things are indeed used by programmers only, end-users search
- ~ on fixed strings and would be surprised if their dotted-a shows no
- ~ match because of language differences.
-
- i think this is wrong, too. almost all systems offer some
- facilities and applications that present sorted data, mainly
- filenames. many applications also offer some form of pattern matching,
- even if it is just some simple wildcard matching. it is important
- that the sorting rules be comprehensible and (echoing vadim here)
- SIMPLE. this is relevant. whether or not vadim's scheme adequately addresses
- this is a factual and technical question, and implies a technical
- (non-hysterical nor name-calling) reply.
-
- just to show my impartiality, i think a weakness of vadim's
- scheme is the difficulty of distinguishing the different characters to
- the user. for vadim's example of cyrillic and english, that is not a problem
- as teh cyrillic and latin fonts are easily distinguishable. but what about
- text which has french/german/italian/english mixed together? or as erland
- pointed out, text with a bunch of scandanavian languages mixed together?
- being able to have several almost identical but distinguishable fonts
- on the screen at once is a fairly difficult task (although not
- very difficult).
-
- andrew hume
-