home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!europa.asd.contel.com!howland.reston.ans.net!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!news.sei.cmu.edu!drycas.club.cc.cmu.edu!pitt.edu!djbpitt
- Newsgroups: comp.std.internat
- Subject: Language tagging
- Message-ID: <1321@blue.cis.pitt.edu>
- From: djbpitt+@pitt.edu (David J Birnbaum)
- Date: 2 Jan 93 17:29:57 GMT
- Sender: news+@pitt.edu
- References: <id.68CW.A16@ferranti.com> <1i2m57INN4vr@rodan.UU.NET> <1993Jan2.020512.3287@klaava.Helsinki.FI>
- Organization: University of Pittsburgh
- Keywords: ISO10646 Unicode
- Lines: 47
-
- >Just the input problem is enough to kill this idea, I think: The user
- >would have to get the language of every character he types correct,
- >and as somebody who routinely (as in usually large amounts every day)
- >enters text in several languages (English, Swedish, and Finnish, with
- >a few quoted words every now and then from a number of other
- >languages, including German, Latin, and French) and often enough mixes
- >several languages in one sentence, let me tell you that it is _not_
- >going to succeed. Having to switch language mode every few words, or
- >for every different document, is not going a workable solution.
-
- As is readily acknowledged by the UTC, Unicode encodes scripts, not
- languages, which means that language-dependent information must be added
- separately if language-dependent operations are to be performed. As has
- been noted in this forum, language tagging does not solve all locale
- problems, since (for example) sorting may follow different rules in
- different locales within a single language.
-
- I think the objections to the input-related problems of Vadim's proposal
- are misdirected, since both Vadim's proposal and Unicode require the
- user to input language identifying information during data entry if this
- information will be needed for later processing. Under Vadim's
- proposal, what would be input would be an instruction (not part of the
- stored text stream) to shift to the appropriate subset of characters.
- Under a system built on a Unicode character set, what would be input
- would be some sort of language or locale tagging that would be entered
- into the text at a higher level than character set.
-
- In both cases, if you want language-specific data in your text stream,
- you have to say so during input. If I need to insert Bulgarian words
- into a Russian text stream I can do so without indicating a change,
- as long as I understand that the consequence will be that the Bulgarian
- data will be treated like Russian.
-
- The difference between Vadim's proposal and Unicode, then, is not one of
- input, but one of encoding levels; Vadim wants the language information
- to be part of the character set, while Unicode puts it at a different
- level. Regardless of which side one takes in this disagreement, the
- user at the keyboard has to input "Finnish" or "German" or whatever if
- he wants Finnish data to be treated differently from German data.
-
- --David
- --
- --
- Professor David J. Birnbaum djbpitt+@pitt.edu [Internet]
- The Royal York Apartments, #802 djbpitt@pittvms [Bitnet]
- 3955 Bigelow Boulevard voice: 1-412-687-4653
- Pittsburgh, PA 15213 USA fax: 1-412-624-9714
-