home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!wupost!spool.mu.edu!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!seunet!enea!sommar
- From: sommar@enea.se (Erland Sommarskog)
- Newsgroups: comp.std.internat
- Subject: Re: Language tagging
- Message-ID: <1993Jan3.203017.232@enea.se>
- Date: 3 Jan 93 20:30:17 GMT
- References: <1321@blue.cis.pitt.edu> <1993Jan2.231703.21201@enea.se> <1336@blue.cis.pitt.edu>
- Organization: Enea Data AB
- Lines: 32
-
- David J Birnbaum (djbpitt+@pitt.edu) writes:
- >Concerning the former, I would normally require that my texts include
- >language identification, so that the same "this is Bulgarian" or "this
- >is Russian" information would be present in both a Unicode-based system
- >and Vadim's system, although it would not be an inalienable part of the
- >character set in the former. Thus, I would be "stuck with" language
- >information under both systems. While Unicode is capable of
- >representing text without language information (by eschewing the use of
- >tags), I can't think of a situation where I would want to do so.
-
- This I don't understand. If you throw away a short e-mail message
- to a friend, why would it be necessary to tag the text with the
- language?
-
- And why would "this is Bulgarian" be different from "this is a text
- about motorcycles"?
-
- I don't doubt that there are many situations you want to keep track
- of which language a text is written in, but what I question is that
- the language should is the only such important property. Peter Da
- Silva discussed SGML tags in another article, and intuitively it seems
- to me that SGML tags is precisely what you are looking for.
-
- It is also worth noting that while plain Unicode you can send away
- an untagged e-mail without problems, you cannot doing so with Vadim
- Antonov suggests as far as I understand. If all you have is a number
- of small 8-bit sets, then you must tag the text with the set being
- used if you expect the receptor's default set being another than
- the one you use, else the text may come up as garbage.
- --
- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
- Jag gav en k{ck tjeck en check.
-