home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!sun-barr!sh.wide!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta
- From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
- Newsgroups: comp.std.internat
- Subject: Re: Radicals Instead of Characters
- Message-ID: <2808@titccy.cc.titech.ac.jp>
- Date: 24 Jan 93 12:19:05 GMT
- References: <1j9sfpINN46t@life.ai.mit.edu> <1jfgq1INNqmn@flop.ENGR.ORST.EDU> <2791@titccy.cc.titech.ac.jp> <1jpj9sINNlie@flop.ENGR.ORST.EDU>
- Sender: news@titccy.cc.titech.ac.jp
- Organization: Tokyo Institute of Technology
- Lines: 63
-
- In article <1jpj9sINNlie@flop.ENGR.ORST.EDU>
- crowl@jade.CS.ORST.EDU (Lawrence Crowl) writes:
-
- >>>The question I was asking was "can you _identify_ a han/kanji character
- >>>based on a sequence of radicals"
- >>
- >>No, you can't. Radicals are for indexing only. The rest of the character
- >>has its own complex shape.
- >
- >If you can use radicals for indexing, then you can use them to identify
- >characters.
-
- No you can't. The correspondence is not one to one. It is many to many.
-
- >>Such encoding is too lengthy.
-
- >An encoding every variant of every character ever written is not?
-
- Of course, not.
-
- >Your non-unified
- >approach would require roughly eighteen bits per character.
-
- What's the problem? 18 bit character set is just fine.
-
- >>Moreover, you will have to have sixteen 4000 entry tables which is as
- >>large as a single 64000 entry table.
- >
- >No, I don't have to have sixteen 4000 entry tables. I only need one.
-
- Could you please elaborate? Your argument leaves me unconvinced.
-
- >>If you use radical based encoding, it makes everything complex.
-
- >Could you please elaborate? Your argument leaves me unconvinced.
-
- Perhaps because you don't understand that you need sixteen 4000 entry
- tables.
-
- >>>But, can sixteen bits represent _all_ historical Han characters _and_
- >>>the historical texts of all other languages? My guess is 16 bits can
- >>>_if_ Han characters are coded as radicals,
- >>
- >>Maybe nor may not be. Many complex Han characters are just unique.
- >
- >Unique in what sense? Examples?
-
- As those characters are really complex I don't write figures of them.
- See DIS 10646-1.2.
-
- >>BTW, from the view point of programmers, combining characters are
- >>just unusable.
- >
- >I am a programmer.
-
- Then, how can your program detect the character boudaries?
-
- That you can't find character boudaries without lookaheading makes
- all programs complex and interractive programming impossible.
-
- Are you really a programmer?
-
- Masataka Ohta
-