home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.bsd
- Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1992Dec28.062554.24144@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: University of Utah Computer Center
- References: <id.M2XV.VTA@ferranti.com> <1992Dec18.043033.14254@midway.uchicago.edu> <1992Dec18.212323.26882@netcom.com> <1992Dec19.083137.4400@fcom.cc.utah.edu> <2564@titccy.cc.titech.ac.jp>
- Date: Mon, 28 Dec 92 06:25:54 GMT
- Lines: 164
-
- In article <2564@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
- |> In article <1992Dec19.083137.4400@fcom.cc.utah.edu>
- |> terry@cs.weber.edu (A Wizard of Earth C) writes:
- |>
- |> >US Engineers produce software for the available market; because of the
- |> >input difficulties involved in 6000+ glyph sets of symbols, there has been
- |> >a marked lack of standardization in Japanese hardware and software. This
- |> >means that the market in Japan consists of mostly "niche" markets, rather
- |> >than being a commodity market.
- |>
- |> Do you know what Shift JIS is? It's a defacto standard for charcter encoding
- |> established by microsoft, NEC, ASCII etc. and common in Japanese PC market.
-
- I am aware of JIS; however, even you must agree that the Japaneese hardware
- and software markets have not reached the level of "commodity hardware"
- found elsewhere in the world (ie: the US and Europe). There are multiple
- conflicting platforms, and thus multiple conflicting code sets for
- implementation. If we had to pick one platform to support (I am loathe to
- do this, as it means support for other platforms may be ignored until
- something incompatable has fossilized) it would probably be the NEC 98, which
- is not even PC compatible.
-
- I think other mechanisms, such as ATOK, Wnn, and KanjiHand deserve to be
- examined. One method would be to adopt exactly the input mechanism of
- "Ichi-Taro" (the most popular NEC 98 word processer).
-
- |> Now, DOS/V from IBM strongly supports Shift JIS.
- |>
- |> In the workstation market in Japan, some supports Shift JIS, some
- |> supports EUC and some supports both. Of course, many US companies
- |> sell Japanized UNIX on thier workstations.
-
- I think this is precisely what we want to avoid -- localization. The basic
- difference, to my mind, is that localization invloves the maintenance of
- multiple code sets, whereas internationalization requires maintenance of
- multiple data sets, a much smaller job.
-
- |> >This has changed somewhat with the Nintendo
- |> >corporations recent successes in Japan, where standardized hardware is
- |>
- |> I'm sure you are just joking here.
-
- Yes, this was intended to be a jab at localization of a system as opposed
- to internationalization. The set of Nintendo games in the US and Japan
- are largely non-intersecting sets of software... games sold in the US are
- not sold in Japan and vice versa. I feel that "localization" is the
- "Nintendo" soloution. I also feel that we need to be striving for a level
- of complexity well above that of a toy.
-
- |> >Microsoft has adopted Unicode as a standard. It will probably be the
- |> >prevalent standard because of this -- the software world is too wrapped
- |> >up in commodity (read "DOS") hardware for it to be otherwise. Unicode
- |> >has also done something that XPG4 has not: unified the Far Eastern and
- |> >all other written character sets in a single font, with room for some
- |> >expansion (to the full 16 bits) and a discussion of moving to a full
- |> >32 bit mechanism.
- |>
- |> Do you know that Japan vote AGAINST ISO10646/Unicode, because it's not
- |> good for Japanese?
- |>
- |> >So even if the Unicode standard ignores backward compatability
- |> >with Japanese standards (and specific American and European standards),
- |> >it better supports true internationalization.
- |>
- |> The reason of disapproval is not backward compatibility.
- |>
- |> The reason is that, with Unicode, we can't achieve internationalization.
-
- This I don't understand. The maximum translation table from one 16 bit value
- to another is 16k. This means 2 16k tables for translation into/out of
- Unicode for Input/Output devices, and one 16k table and one 512 byte table
- if a compact storage methos is used to remove the normal 2X storage penalty
- for 256 character languages, like most European languages.
-
- I don't see why the storage mechanism in any way effects the validity of the
- data -- and thus I don't understand *why* you say "with Unicode, we can't
- achieve internationalization."
-
- |> >XPG4, by adopting the JIS standard, appears to be
- |> >igonoring HAN (Chinese) and many other languages covered by the Unicode
- |> >standard.
- |>
- |> Unicode can not cover both Japanese and Chinese at the same time, because
- |> the same code points are shared between similar characters in Japan
- |> and in China.
-
- I don't understand this, either. This is like saying PC ASCII can not cover
- both the US and the UK because the American and English pound signs are not
- the same, or that it can't cover German or Dutch because of the 7 characters
- difference needed for support of those languages.
-
- |> Of course, it is possible to LOCALIZE Unicode so that it produces
- |> Japanese characters only or Chinese characters only. But don't we
- |> need internationalization?
-
- The point of an internationalization effort (as *opposed* to a localization
- effort) is the coexistance of languages within the same processing means.
- The point is not to produce something which is capable of "only English" or
- "only French" or "only Japanese" at the flick of an environment variable;
- the point is to produce something which is *data driven* and localized by
- a change of data rather than by a change of code. To do otherwise would
- require the use of multiple code trees for each language, which was the
- entire impetus for an internationalization effort in the first place.
-
- |> Or, how can I process a text containing both Japanese and Chinese?
-
- Obviously, the input mechanisms will require localization for the set of
- characters out of the Unicode set which will be used for a particular
- language; there is no reason JIS input can not be used to input Unicode
- as well as any other font; your argument that the lexical order of the
- target language effects the usability of a storage standard is invalid.
- Sure, the translation mechanisms may be *easier* to code given localization
- of lexical ordering, but that doesn't mean they *can't* be coded otherwise;
- if it was easy, we'd do it in hardware. ;-).
-
- |> >I think that Japaneese
- |> >users (and European and American users, if nothing is done about storage
- |> >encoding to 8 bit sets) are going to have to live with the drawbacks of
- |> >the standard for a very long time (the primary one being two 16K tables
- |> >for input and output for each language representable in 8 bits, and two
- |> >16k tables for runic mapping for languages, like Japaneese, which don't
- |> >fit on keyboards without postprocessing).
- |>
- |> What? 16K? Do you think 16K is LARGE?
- |>
- |> Then, you know nothing about how Japanese are input. We are happily using
- |> several hundreds kilo bytes or even several mega bytes of electrical
- |> dictionary, even on PCs.
-
- No, I don't think 16k is large; however the drawback is not in the size of
- the tables, but in their use on every character in from an input device or
- out to an output device. In addition, an optimization of the file system
- to allow for "lexically compact storage" (my term) is necessary to make
- Americans and Europeans accept the mechanism. This involves yet another
- set of localization-specific storage tables to translate from an ISO or
- other local font to Unicode and back on attributed file storage. To do
- otherwise would require 16 bit sotrage of files, or worse, runic encoding
- of any non-US ASCII characters in a file. This either doubles the file
- size for all text files (something the west _will_not_accept_), or
- "pollutes" the files (all files except those stored in US-ASCII have file
- sizes which no longer reflect true character counts on the file).
-
- Admittedly, these mechanisms are adapatable for XPG4 (not widely available)
- and XPG3 (does not support eastern languages), but the MicroSoft adoption
- of Unicode tells us that at least 90% of the market is now committed to
- Unicode, if not now, then in the near future.
-
-
- I would like to hear any arguments anyone has regarding *why* Unicode is
- "bad" and should not be adopted in the remaining 10% of the market (thus
- ensuring incompatability and a lack of interoperability which is guaranteed
- to prevent penetration of the existing 90%).
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-