NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / std / internat / 992 < prev next >

Wrap

Internet Message Format | 1993-01-02 | 3.1 KB

Path: sparky!uunet!europa.asd.contel.com!howland.reston.ans.net!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!news.sei.cmu.edu!drycas.club.cc.cmu.edu!pitt.edu!djbpitt Newsgroups: comp.std.internat Subject: Language tagging Message-ID: <1321@blue.cis.pitt.edu> From: djbpitt+@pitt.edu (David J Birnbaum) Date: 2 Jan 93 17:29:57 GMT Sender: news+@pitt.edu References: <id.68CW.A16@ferranti.com> <1i2m57INN4vr@rodan.UU.NET> <1993Jan2.020512.3287@klaava.Helsinki.FI> Organization: University of Pittsburgh Keywords: ISO10646 Unicode Lines: 47 >Just the input problem is enough to kill this idea, I think: The user >would have to get the language of every character he types correct, >and as somebody who routinely (as in usually large amounts every day) >enters text in several languages (English, Swedish, and Finnish, with >a few quoted words every now and then from a number of other >languages, including German, Latin, and French) and often enough mixes >several languages in one sentence, let me tell you that it is _not_ >going to succeed. Having to switch language mode every few words, or >for every different document, is not going a workable solution. As is readily acknowledged by the UTC, Unicode encodes scripts, not languages, which means that language-dependent information must be added separately if language-dependent operations are to be performed. As has been noted in this forum, language tagging does not solve all locale problems, since (for example) sorting may follow different rules in different locales within a single language. I think the objections to the input-related problems of Vadim's proposal are misdirected, since both Vadim's proposal and Unicode require the user to input language identifying information during data entry if this information will be needed for later processing. Under Vadim's proposal, what would be input would be an instruction (not part of the stored text stream) to shift to the appropriate subset of characters. Under a system built on a Unicode character set, what would be input would be some sort of language or locale tagging that would be entered into the text at a higher level than character set. In both cases, if you want language-specific data in your text stream, you have to say so during input. If I need to insert Bulgarian words into a Russian text stream I can do so without indicating a change, as long as I understand that the consequence will be that the Bulgarian data will be treated like Russian. The difference between Vadim's proposal and Unicode, then, is not one of input, but one of encoding levels; Vadim wants the language information to be part of the character set, while Unicode puts it at a different level. Regardless of which side one takes in this disagreement, the user at the keyboard has to input "Finnish" or "German" or whatever if he wants Finnish data to be treated differently from German data. --David -- -- Professor David J. Birnbaum djbpitt+@pitt.edu [Internet] The Royal York Apartments, #802 djbpitt@pittvms [Bitnet] 3955 Bigelow Boulevard voice: 1-412-687-4653 Pittsburgh, PA 15213 USA fax: 1-412-624-9714