NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 994 < prev next >

Wrap

Internet Message Format | 1993-01-02 | 2.6 KB

Path: sparky!uunet!pipex!bnr.co.uk!uknet!mcsun!sunic!seunet!enea!sommar From: sommar@enea.se (Erland Sommarskog) Newsgroups: comp.std.internat Subject: Re: Language tagging Message-ID: <1993Jan2.231703.21201@enea.se> Date: 2 Jan 93 23:17:03 GMT References: <1i2m57INN4vr@rodan.UU.NET> <1993Jan2.020512.3287@klaava.Helsinki.FI> <1321@blue.cis.pitt.edu> Organization: Enea Data AB Lines: 40 David J Birnbaum (djbpitt+@pitt.edu) writes: >I think the objections to the input-related problems of Vadim's proposal >are misdirected, since both Vadim's proposal and Unicode require the >user to input language identifying information during data entry if this >information will be needed for later processing. Under Vadim's >proposal, what would be input would be an instruction (not part of the >stored text stream) to shift to the appropriate subset of characters. >Under a system built on a Unicode character set, what would be input >would be some sort of language or locale tagging that would be entered >into the text at a higher level than character set. > >In both cases, if you want language-specific data in your text stream, >you have to say so during input. If I need to insert Bulgarian words >into a Russian text stream I can do so without indicating a change, >as long as I understand that the consequence will be that the Bulgarian >data will be treated like Russian. Then I have to ask you explain something about Unicode I don't know. It is true that if you are using language-dependent features such as spell-checking and hyphenation, while inputting the text, then you have to know what you are doing with Unicode. But once you're done with it, it doesn't matter any longer, until the next time you want to process the text in some way. With Unicode you can decide whether to treat the text in Bulgarian or Russian, with Vadim's system you're stuck unless you convert it. (Well, maybe you'll get away with Bulgarian and Russian, but not with Swedish and German. Or as I demonstrated in another article, Swedish and Danish.) But of course, if I have CCCP in a Swedish text and copies the word to a Russian text I will see funny things both with Vadim Antonov's system and 10646. (And I am sure this confusion will cost someone a couple of wasted work hours in finding the error.) But at least shifting scripts is a more obvious, than changing languages. If I switch from Swedish to Russian I would probably change the keyboard set-up, but not if I switch from Swedish to German - there is no reason to. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se Jag gav en k{ck tjeck en check.