home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!sequent!gaia.ucs.orst.edu!flop.ENGR.ORST.EDU!jade.CS.ORST.EDU!crowl
- From: crowl@jade.CS.ORST.EDU (Lawrence Crowl)
- Newsgroups: comp.std.internat
- Subject: Re: Radicals Instead of Characters
- Date: 22 Jan 1993 19:52:28 GMT
- Organization: Computer Science Department, Oregon State University
- Lines: 56
- Message-ID: <1jpj9sINNlie@flop.ENGR.ORST.EDU>
- References: <1j9sfpINN46t@life.ai.mit.edu> <1jfgq1INNqmn@flop.ENGR.ORST.EDU> <2791@titccy.cc.titech.ac.jp>
- NNTP-Posting-Host: jade.cs.orst.edu
-
- In article <2791@titccy.cc.titech.ac.jp>
- mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
- >In article <1jfgq1INNqmn@flop.ENGR.ORST.EDU>
- > crowl@jade.CS.ORST.EDU (Lawrence Crowl) writes:
- >
- >>The question I was asking was "can you _identify_ a han/kanji character
- >>based on a sequence of radicals"
- >
- >No, you can't. Radicals are for indexing only. The rest of the character
- >has its own complex shape.
-
- If you can use radicals for indexing, then you can use them to identify
- characters. The process of identifying a character need not be able
- to, by itself, generate an acceptable image.
-
- >>and "would it be reasonable to encode
- >>han/kanji on that basis".
- >
- >Such encoding is too lengthy.
-
- An encoding every variant of every character ever written is not? At
- 214 radicals, we can represent a radical in eight bits, and a character
- in 8*(average number of radicals per character). Your non-unified
- approach would require roughly eighteen bits per character.
-
- >>Agreed. However, there is no natural size for tables. Table sized of
- >>4000 are much cheaper than table sizes of 64000.
- >
- >If you use radical based encoding, it makes everything complex.
-
- Could you please elaborate? Your argument leaves me unconvinced.
-
- >Moreover, you will have to have sixteen 4000 entry tables which is as
- >large as a single 64000 entry table.
-
- No, I don't have to have sixteen 4000 entry tables. I only need one.
-
- >>But, can sixteen bits represent _all_ historical Han characters _and_
- >>the historical texts of all other languages? My guess is 16 bits can
- >>_if_ Han characters are coded as radicals,
- >
- >Maybe nor may not be. Many complex Han characters are just unique.
-
- Unique in what sense? Examples?
-
- >BTW, from the view point of programmers, combining characters are
- >just unusable.
-
- I am a programmer. It is from a programmer's view that I made my
- proposal. What I am proposing is not unusable. In many ways it is
- more usable that straight character coding.
-
- --
- Lawrence Crowl 503-737-2554 Computer Science Department
- crowl@cs.orst.edu Oregon State University
- ...!hplabs!hp-pcd!orstcs!crowl Corvallis, Oregon, 97331-3202
-