home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!sequent!gaia.ucs.orst.edu!flop.ENGR.ORST.EDU!jade.CS.ORST.EDU!crowl
- From: crowl@jade.CS.ORST.EDU (Lawrence Crowl)
- Newsgroups: comp.std.internat
- Subject: Re: Radicals Instead of Characters
- Date: 21 Jan 1993 21:07:10 GMT
- Organization: Computer Science Department, Oregon State University
- Lines: 93
- Message-ID: <1jn39uINNbk0@flop.ENGR.ORST.EDU>
- References: <1j8kroINNf59@flop.ENGR.ORST.EDU> <1993Jan18.212846.3030@fcom.cc.utah.edu> <mvdvalk.727454246@rhone> <MELBY.93Jan21144739@dove.yk.fujitsu.co.jp> <1jlngtINNqnk@life.ai.mit.edu>
- NNTP-Posting-Host: jade.cs.orst.edu
-
- In article <mvdvalk.727454246@rhone>
- mvdvalk@cs.utwente.nl (Martijn van der Valk) writes:
- >May be it's because I'm just too dumb to see the point of using ``radicals
- >instead of characters'', but to me it seems that the majority of Chinese
- >characters contain alot of ``familiar sets of strokes'' which are NOT radicals
- >according to KangXi dictionary. How to encode these? Enlarge the set of
- >radicals to encompass these ``pseudo-radicals''? Anyway, I don't get the
- >point. Could the original poster please re-explain what he means with the
- >original statement?
-
- The proposal was that each of the 214 radicals be given a code point
- similar to letters. Each CJK character would be represented as a
- sequence of radical codes, just as an English word is represented as a
- sequence of letter codes. The important criteria is that each CJK
- character be uniquely determined by a sequence of radicals, not that it
- _appear_ as a simple composition of radicals.
-
- The advantage to this approach is that it permits coding (nearly) all
- >50,000 CJK characters with (roughly) 214 code points, in contrast to
- the current unicode scheme, which (presently) encodes >20,000
- characters in >20,000 code points. Fewer code points translates to
- smaller tables, possibly fewer bits in the code, and potentially lower
- costs. With this approach, an international code might fit into 12
- bits instead of 16. Countries not requiring CJK characters would then
- save 25% in the length of their text and 95% in the size of display
- tables and so forth.
-
- The disadvantage to the radical coding approach, is that now individual
- characters require multiple code points. Suppose
- - the hypothetical international-radical-coded character set
- requires 12 bits,
- - the number of distict characters to be represented requires
- 18 bits, and
- - there are an average of 2.5 radicals per character.
- A paragraph of CJK characters would require 66% more bits with the
- radical-coded approach. Table sizes for CJK fonts would not be
- affected.
-
- In article <1jlngtINNqnk@life.ai.mit.edu>
- glenn@muesli.ai.mit.edu (Glenn A. Adams) writes:
- >In article <MELBY.93Jan21144739@dove.yk.fujitsu.co.jp>
- >melby@dove.yk.fujitsu.co.jp (John B. Melby) writes:
- >>Looking at Han characters in a probabilistic sense probably is not going
- >>to help much, since the positioning of radicals varies widely between
- >>characters.
- >
- >The idea being discussed for Han decomposition would have different
- >combining radicals for each of the possible positions the radical
- >could take; e.g. MAN-LEFT, MAN-TOP, MAN-BOTTOM, etc.
-
- I was thinking more that radicals would have a defined order, so that,
- for instance, the first radical coded would be the one on the upper
- left.
-
- >>(1) some rare characters cannot be expressed in this manner,
- >
- >Characters which could not be decomposed in this manner would be
- >represented in their entirety (i.e., as non-decomposed symbols).
- >
- >>(2) allowing the display of arbitrary characters using this sort of
- >>composition does not mean that their components will be aesthetically
- >>spaced.
- >
- >A system that displayed such decomposed symbols would most likely
- >employ a font which either (1) contained glyphs that represented the
- >entire symbol; or (2) contained internal instructions that would allow
- >it to position the radical properly. In both cases, the correct
- >display geometry would be used. The display engine would have to
- >map multiple coded character elements to single glyph references
- >or mutliple glyph references as appropriate.
-
- In addition, display systems using the radical coded approach can
- provide cheap low-quality display of CJK characters by composing the
- radicals. This would permit display of CJK characters in those markets
- where need for such display is rare. I can't imagine anyone selling
- such displays to someone who uses CJK characters more than rarely.
-
- >>A 16 bit font is insufficient for encoding rare characters, whichever way
- >>you look at it, although having 16-bit CJK unification and a user-defined
- >>character facility may be sufficient for an average user.
- >
- >Keep in mind that there is no necessary relation between a 16-bit character
- >encoding and a 16-bit font. One can have a 16-bit character encoding like
- >Unicode (with 20,902 precomposed Han characters, and possibly a collection
- >of combining radical characters) and display with a 16-bit font that contains
- >2^16 Han glyphs, or even with a 24-bit font, a 32-bit font, etc. The
- >relation of Unicode character code to font code is not defined by the
- >Unicode display model.
-
- --
- Lawrence Crowl 503-737-2554 Computer Science Department
- crowl@cs.orst.edu Oregon State University
- ...!hplabs!hp-pcd!orstcs!crowl Corvallis, Oregon, 97331-3202
-