home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.perl
- Path: sparky!uunet!cs.utexas.edu!sun-barr!sh.wide!wnoc-tyo-news!sranha!sranhd!sran230!utashiro
- From: utashiro@sran230.sra.co.jp (Kazumasa Utashiro)
- Subject: Re: Japanese character stuff (Was: Perl 5 and Latin-1)
- References: <mark.724768788@coombs>
- Organization: Software Research Associates, Inc., Japan
- Date: Tue, 22 Dec 1992 02:41:40 GMT
- Message-ID: <Bzn3HI.MCy@sran230.sra.co.jp>
- Lines: 52
-
- In article <mark.724768788@coombs> mark@coombs.anu.edu.au (Mark) writes:
- >> barnett@paintbrush.mcc.com (Jim Barnett) writes:
- >> >kono@csl.sony.co.jp (Shinji Kono) writes:
- >>
- >> > Support for 8bit char and multi-byte code sounds very good. I heard
- >> > there is a two byte char code version of perl for Japanese.
- >>
- >> >Could anyone point me to this version of Perl? We're doing a project
- >> >involving a large amount of Japanese text, and a 2-byte version would
- >> >be a big help. I've been able to trick the regexp matcher into doing
- >> >simple matches, but fancier things would be a problem.
- >>
- >> I'm not sure about the C code version but I'm sure if you ftp'd to
- >> sra.co.jp you would find some joy there...
-
- This library is not for multi-byte character handling but
- for Japanese character code conversion. We are using
- several different Japanese character code sets and sometimes
- we have to convert from one to another.
-
- Perl enhanced to handle Japanese character is called 'jperl'
- which is developed by serow@ibix.co.jp and is available by
- ftp from many sites including sra.co.jp.
-
- Here are basic features of jperl.
-
- + special character handling in string constant
- + regular expression
- . matches to two-byte character
- [a-z] syntax allows to use Japanse
- + tr allows to use Japanese character
- + chop chops last two bytes if it is multi-byte code
- + split(//) returns Japanese character list
- + etc.
-
- I personaly don't use jperl because jperl is not compatible
- with perl. That means script written for perl sometimes
- doesn't work as expected on jperl. This happens offten when
- the script deals with binary data.
-
- As I showed by my jperl.pl library, it is possible to handle
- Japanese multi-byte characters by normal perl. Most of text
- processing tools ftpable from sra.co.jp can handle Japanese
- character code properly and, of course, works with ascii
- text. But if you want to use Japanese in regexp, jperl is
- to be used.
-
- It is a problem that my scripts don't work with latin-1
- because [\200-\377] is always treated as a first byte of
- two-byte character.
-
- --utashiro
-