NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / lang / perl / 7575 < prev next >

Wrap

Text File | 1992-12-22 | 2.5 KB | 63 lines

Newsgroups: comp.lang.perl Path: sparky!uunet!cs.utexas.edu!sun-barr!sh.wide!wnoc-tyo-news!sranha!sranhd!sran230!utashiro From: utashiro@sran230.sra.co.jp (Kazumasa Utashiro) Subject: Re: Japanese character stuff (Was: Perl 5 and Latin-1) References: <mark.724768788@coombs> Organization: Software Research Associates, Inc., Japan Date: Tue, 22 Dec 1992 02:41:40 GMT Message-ID: <Bzn3HI.MCy@sran230.sra.co.jp> Lines: 52 In article <mark.724768788@coombs> mark@coombs.anu.edu.au (Mark) writes: >> barnett@paintbrush.mcc.com (Jim Barnett) writes: >> >kono@csl.sony.co.jp (Shinji Kono) writes: >> >> > Support for 8bit char and multi-byte code sounds very good. I heard >> > there is a two byte char code version of perl for Japanese. >> >> >Could anyone point me to this version of Perl? We're doing a project >> >involving a large amount of Japanese text, and a 2-byte version would >> >be a big help. I've been able to trick the regexp matcher into doing >> >simple matches, but fancier things would be a problem. >> >> I'm not sure about the C code version but I'm sure if you ftp'd to >> sra.co.jp you would find some joy there... This library is not for multi-byte character handling but for Japanese character code conversion. We are using several different Japanese character code sets and sometimes we have to convert from one to another. Perl enhanced to handle Japanese character is called 'jperl' which is developed by serow@ibix.co.jp and is available by ftp from many sites including sra.co.jp. Here are basic features of jperl. + special character handling in string constant + regular expression . matches to two-byte character [a-z] syntax allows to use Japanse + tr allows to use Japanese character + chop chops last two bytes if it is multi-byte code + split(//) returns Japanese character list + etc. I personaly don't use jperl because jperl is not compatible with perl. That means script written for perl sometimes doesn't work as expected on jperl. This happens offten when the script deals with binary data. As I showed by my jperl.pl library, it is possible to handle Japanese multi-byte characters by normal perl. Most of text processing tools ftpable from sra.co.jp can handle Japanese character code properly and, of course, works with ascii text. But if you want to use Japanese in regexp, jperl is to be used. It is a problem that my scripts don't work with latin-1 because [\200-\377] is always treated as a first byte of two-byte character. --utashiro