Source Code 1994 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1994 March / Source_Code_CD-ROM_Walnut_Creek_March_1994.iso / compsrcs / misc / volume36 / translit / part01 < prev next >

Wrap

Text File | 1993-03-21 | 60.4 KB | 1,496 lines

Newsgroups: comp.sources.misc From: jkl@osc.edu (Jan Labanowski) Subject: v36i023: translit - transliterate foreign alphabets, Part01/10 Message-ID: <csm-v36i023=translit.163954@sparky.IMD.Sterling.COM> X-Md4-Signature: 1fdf62718ac15c13f16020f8f731cbf8 Date: Fri, 19 Mar 1993 22:40:58 GMT Approved: kent@sparky.imd.sterling.com Submitted-by: jkl@osc.edu (Jan Labanowski) Posting-number: Volume 36, Issue 23 Archive-name: translit/part01 Environment: UNIX, MS-DOS, VMS Available-from: kekule.osc.edu (128.146.36.48) in /pub/russian/translit Copyright-note: Yes, you have to distribute the complete package. Translit is a general transliteration program. It transliterates between different alphabet representations of different languages. It is frequently necessary to convert from one representation to another representation of the foreign alphabet. E.g., in the Library of Congress transliteration, the Russian letter sha is transliterated as two Latin letters "sh" while the popular word processors use a code 232 (decimal), the RELCOM network uses a code 221, and the KOI7 set uses character "[" for the same letter. So if your screen driver, printer, word processor, etc. uses different codes than the text file which you have, you need to transliterate. The TRANSLIT program is a powerful tool for such tasks. It converts an input file in one representation to the output file in another representation using appropriate, user defined, transliteration table. Transliteration table allows for very elaborate transliteration tasks and includes provisions for plain character sequences, character lists, regular expressions (flexible matches), SHIFT-OUT/IN sequences and more. The program comes with documentation and examples of popular transliteration schemes. The Russian language serves as an example. Other files will be added with your collaboration. The most current version of translit will be available from ftp kekule.osc.edu (or ftp 128.146.36.48) in the directory /pub/russian/translit Via E-mail, first retrieve the file readme.doc. It describes the files in the program distribution and has detailed instructions on how to obtain the program. Send the message: send translit/readme.doc from russian to OSCPOST@osc.edu or OSCPOST@OHSTPY.BITNET. The file readme.doc will be forwarded to your mailbox. Enjoy, Author coordinates: Jan Labanowski P.O. Box 21821 Columbus, OH 43221-0821, USA jkl@osc.edu, JKL@OHSTPY.BITNET ------- #! /bin/sh # This is a shell archive. Remove anything before this line, then feed it # into a shell via "sh file" or similar. To overwrite existing files, # type "sh file -c". # Contents: translit.1 # Wrapped by kent@sparky on Fri Mar 19 16:00:08 1993 PATH=/bin:/usr/bin:/usr/ucb:/usr/local/bin:/usr/lbin ; export PATH echo If this archive is complete, you will see the following message: echo ' "shar: End of archive 1 (of 10)."' if test -f 'translit.1' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'translit.1'\" else echo shar: Extracting \"'translit.1'\" $56776 characters$ sed "s/^X//" >'translit.1' <<'END_OF_FILE' X.TH TRANSLIT JKL "23-Jan-1993" JKL "Version 1.0" X.DA 20 Jan 1993 X.SH NAME X.IP \fITRANSLIT\fR XProgram to transliterate texts in different character sets. The program Xconverts input character codes (or sequences of codes) to a different set Xof output character codes (or sequences of codes). Intended for Xtransliteration to/from phonetic representation of foreign letters with XLatin letters from/to special national codes used for these letters. XIt supports simple matches, character lists and flexible matches via Xregular expressions. The new transliteration schemes are easily added Xby creating simple transliteration tables. Multiple character sets Xare supported for input and output. It does not yet support UNICODE, Xbut some day it will. X X.SH COPYRIGHT XCopyright (c) 1993 Jan Labanowski and JKL Enterprises, Inc. X.br XYou may distribute the Software only as a complete set of files. XYou may distribute the modified Software only if you retain the XCopyright notice and you do not delete original code, data, documentation Xand associated files. XThe Software is copyrighted. You may not sell the software or incorporate Xit in the commercial product without written permission from XJan Labanowski or JKL Enterprises, Inc. You are allowed to charge for media Xand copying if you distribute the whole unaltered package. X X.SH SYNOPSIS X.B translit X[ X.B -i X.I inpfile X][ X.B -o X.I outfile X][ X.B -d X][ X.B -t X.I transtbl \|\||\|\| transtbl X] X.br X X.SH OPTIONS X.IP "\fB-i\fP \fIinpfile\fP" X.I inpfile Xis a name of input file to be transliterated. XIf "\fB-i\fP" is not specified, the input is taken from Xstandard input. X.IP "\fB-o\fP \fIoutfile\fP" X.I outfile Xis an output file, where the transliterated Xtext is stored. If "\fB-o\fP" is not specified, the output is Xdirected to the standard output. Program will not overwrite the existing Xfile. If file exists, you need to delete it first. X.IP "\fB-d\fP" XSome information on character codes read from transliteration table file Xare sent to standard error ("\fIstderr\fP"). Useful when developing Xnew transliteration tables. X.IP "\fB-t\fP \fItranstbl\fP" X.I transtbl Xis a transliteration table file which you want to use. The "\fB-t\fP" Xoption may be omitted if the \fItranstbl\fR Xis specified as the last parameter on the Xcommand line. The program first tries to locate \fItranstbl\fR Xfile in the current directory, and if not found, it Xsearches the directory chosen at compilation/installation time in X"\fIpaths.h\fP". If no "\fItranstbl\fP" is given, the default file name Xspecified in "\fIpaths.h\fP" is taken. The compile/installation Xtime defaults in X"\fIpaths.h\fR" for the search directory and the default Xfile name can be overiden Xby setting environment variables: TRANSP and TRANSF, respectively (see below). X X.SH ENVIRONMENT VARIABLES XThe default path to the directory holding transliteration tables can Xbe overiden by setting environment variable TRANSP. The default name Xfor the transliteration table can be overiden by setting TRANSF environment Xvariable. However, when the transliteration file is given on the command line, Xit will overide the defaults and environment setting. XHere are some examples of setting environment Xvariables for different operating systems: X.sp X.in +2m X.br X\fIUN*X System\fR X.br X.nf X If you are using \fIcsh\fR (C-shell): X setenv TRANSP /home/john/translit/ X setenv TRANSF koi8-tex.rus X If you are using \fIsh\fR (Bourne Shell): X set TRANSP=/home/john/translit/ X export TRANSP X set TRANSF=koi8-tex.rus X export TRANSF X\fIVAX-VMS System\fR X TRANSP:==SYS$USER:[JOHN.TRANSLIT] X TRANSF:==KOI8-TEX.TBL X\fIPC-DOS or MS-DOS\fR X SET TRANSP=C:\|\\\|JOHN\|\\\|TRANSLIT\|\\ X SET TRANSF=KOI8-TEX.TBL X.fi X.in -2m XNote that the directory path has to include concluding Xslashes, \|\\\| or \|/\|\|. X X X.SH EXAMPLES X.ta 5m X.br X cat text.koi8 \|\||\|\| translit koi8-tex.rus > text.tex X.br Xin UN*X is equivalent to: X.sp 1 X translit -t koi8-tex.rus -o text.tex -i text.koi8 X.br Xand converts file text.koi8 to file text.tex using transliteration Xspecified in the file koi8-tex.rus. X.sp 1 X translit -i text.koi8 koi8-cl.rus X.br Xdisplays the converted text from file text.koi8 on your terminal. The Xconversion table is koi8-cl.rus (KOI8 --> Library of Congress). X.sp 1 X translit -i text.alt -t alt-koi8.rus \|\||\|\| translit -o text.tex -t koi8-tex.rus X.br Xis essentially equivalent to the following two commands in UN*X or MS-DOS: X.br X translit -i text.alt -o junkfile -t alt-koi8.rus X.br X translit -i junkfile -o text.tex -t koi8-tex.rus X.br Xand converts the file in ALT character set to a LaTeX file for printing. X.sp X translit -i russ.txt pho-koi8.rus \|\||\|\| translit -o russ.tex koi8-tex.rus X.br Xconverts file russ.txt from phonetic transliteration to LaTeX file russ.tex Xfor printing. X.sp 2 X X.SH TRANSLITERATION TABLES XThe following transliteration files are available with the current Xdistribution. Consult the comments in the individual files for details. X.IP \fIkoi8-tex.rus\fP XConversion table which changes the file in KOI8 (8 bit character set Xused by RELCOM news service) to a LaTeX file for printing with X\fIAMS\fR WNCYR fonts. X.IP \fItex-koi8.rus\fP XConversion table for the LaTeX to KOI8 conversion. Note that it will not Xhandle complicated cases, since LaTeX is a program, and only TeX can Xconvert a LaTeX source to the characters. However, it should work OK Xfor simple cases of text only files, and may need some editing for Xcomplicated cases. X.IP \fIalt-gos.rus\fP XThis is a transliteration data file for converting from ALT (Bryabrins Xalternativnyj variant used in many popular wordprocessors) Xto GOSTSCII 84 (approx. ISO-8859-5?) X.IP \fIalt-koi8.rus\fP XThis is a transliteration data file for converting from ALT to KOI8. XKOI8 is meant to be GOST 19768-74 (as used by RELCOM). X.IP \fIgos-alt.rus\fP XThis is a transliteration data file for converting GOSTSCII 84 X(approx. ISO-8859-5?) to ALT (Bryabrins alternativnyj variant) X.IP \fIgos-koi8.rus\fP XThis is a transliteration data file for converting GOSTSCII 84 X(approx. ISO-8859-5?) to KOI8 used by RELCOM XKOI8 is meant to be GOST 19768-74 X.IP \fIkoi8-alt.rus\fP XThis is a transliteration data file for converting from KOI8. XKOI8 is meant to be GOST 19768-74, to ALT (Bryabrins alternativnyj variant) X.IP \fIkoi8-gos.rus\fP XThis is a transliteration data file for converting from KOI8 (Relcom). XKOI8 is meant to be GOST 19768-74, to GOSTSCII 84 (approx. ISO-8859-5) X.IP \fIkoi8-7.rus\fP XThis file converts from KOI8 to KOI7. X.IP \fIkoi7-8.rus\fP XThis file converts from KOI7 to KOI8. Before you attempt the conversion, Xyou might need to perform a simple edit on your file. You MUST read the Xcomments in \fIkoi7-8.rus\fR before you attempt this conversion. X.IP \fIkoi7nl-8.rus\fP XThis file assumes that there are only Russian letters (no Latin) Xin the input file. If you have Latin letters, and you inserted SHIFT-OUT/IN Xcharacters, use file \fIkoi7-8.rus\fP. X.IP \fIkoi8-lc.rus\fP XThis file converts KOI8 to the Library of Congress transliteration. XSome extensions are added. X.IP \fIkoi8-php.rus\fP XThis file converts KOI8 to the Pokrovsky transliteration. X.IP \fIphp-koi8.rus\fP XThis file converts from Pokrovsky transliteration to KOI8. X.IP \fIkoi8-phg.rus\fP XThis file converts from KOI8 to GOST transliteration. X.IP \fIphg-koi8.rus\fP XThis file converts from GOST transliteration to KOI8. X.IP \fIpho-koi8.rus\fP XThis is a table which will convert from many "phonetic" transliteration Xschemes to KOI8. It is elaborate and it takes a lot of time to Xtransliterate the file using this table. Some transliterations are Xhopeless and internally inconsistent (as humans...), so the results Xcannot be bug free. XYou might want to modify the file, if your transliteration Xpatterns are different than those assumed in this file. You may also want Xto simplify this file if the phonetic transliteration you are converting Xis a sound one (most are not, e.g., they use e for je and e oborotnoye, Xts for c and t-s, h for kha, i for i-kratkoe, etc.). X.sp X X.SH INTRODUCTION XIf you do not intend to write your own transliteration tables, you may Xskip this description and go directly to the installation and Xcopyright sections. However, you might want to read this material anyhow, Xto better understand the traps and complexities of transliteration. XIt is frequently necessary to transliterate text, i.e., to change one set Xof characters (or composite characters, phonemes, etc.) to another set. X.PP XOn computers, the transliteration operation consists of converting the input Xfile in some character set to the output file in another character set. X.PP XIn the simplest case, the single characters are transliterated, i.e, their Xcodes are changed according to some transliteration table. This is called Xremapping and, assuming the one-to-one mapping, the task can be accomplished Xby a simple pseudo program: X.br X new_char_code = character_map[old_char_code]; X.PP XIf the one-to-one correspondence does not exist (i.e., some codes may Xbe present in one set, but do not have corresponding codes in another set), Xprecise transliteration is not possible. In such cases there are 3 obvious Xpossibilities: X.br X 1. skip characters which do not have counterparts, X.br X 2. retain unchanged codes of these characters, X.br X 3. convert the codes to multicharacter sequences. X.br XIn some cases, the file can contain more than one character sets, e.g., Xthe file can contain Latin characters (e.g. English text) and Cyrillic Xcharacters (e.g. Russian text). If the character codes assigned to Xcharacters in different sets do not overlap, this is still a simple mapping Xproblem. This is a case with KOI8 or GOSTCII character tables for Russian, Xwhich reserve the lower 127 codes for standard ASCII codes (which include Xall Latin characters) and characters with codes above 127 for Cyrillic letters. X.PP XIf character codes overlap, there is a SHIFT-OUT/SHIFT-IN technique in Xwhich the meaning of the character sequence is determined by an opening Xcode (or sequence of characters codes). In this case, the meaning of the Xseries of characters is determined by the SHIFT-OUT character (or sequence) Xwhich precedes them. The SHIFT-IN character (or sequence) following the Xseries of characters returns the "reader" to the default or previous status. XTo schemes are used: X.br X (char_set_1)(SHIFT-IN[1])(SHIFT-OUT[2])(char_set_2)... X.br Xor X.br X (char_set_1)(SHIFT-OUT[2])(char_set_2)(SHIFT-OUT[1])char_set_1... X.br X.sp 1 XSince computer keyboards, screens, printers, software, etc., are by necessity Xlanguage specific (the most popular being ASCII), there is a problem of typing Xforeign language text which contains letters different than standard Latin Xalphabet. For this reason, many transliteration schemes use several Latin Xletters to represent a single letter of foreign alphabet, for example: X.br Xzh is used to represent cyrillic letter zhe, \|\\\|"o may be used to Xrepresent the o umlaut, etc. X XIf there is one-to-one mapping of such sequences to another alphabet, it Xis also easy to process. However, it is necessary to substitute longest Xsequences first. For example, a frequently used transliteration Xfor cyrillic letters: X.br X.ta 2mL 7mL 11mL 24mL X \fIshch\fR --- letter \fBshcza\fR 221 (decimal KOI8 code) X.br X \fIsh\fR --- letter \fBsha\fR 219 X.br X \fIch\fR --- letter \fBcze\fR 222 X.br X \fIc\fR --- letter \fBtse\fR 195 X.br X \fIh\fR --- letter \fBkha\fR 200 X.br X \fIa\fR --- letter \fBa\fR 193 X.PP XObviously, in this case, we should proceed first with converting all \fIshch\fR Xsequences to \fBshcha\fR letter, then two-character \fIsh\fR Xand \fIch\fR, and then single Xcharacter \fBc\fR and \fBh\fR. XGenerally, for the one-to-one transliteration, the longest Xsequences should be precessed first, and the order of conversion within Xsequences of the same length makes no difference. XFor example, converting the word "shchah" to KOI8 should proceed in a following Xway: X.br X \fIshchah\fR --> (221)\fIah\fR, (221)\fIah\fR --> (221)(193)\fIh\fR, (221)(193)\fIh\fR --> (221)(193)(200) X.br XThere is a multitude of reasons why transliteration is done. I wrote this Xprogram having in mind the following ones: X.br X 1) to print cyrillic text using TeX/LaTeX and cyrillic fonts X.br X 2) to read KOI8 encoded messages from Russia on my ASCII terminal. X.br XHowever, I was trying to make it flexible to accommodate other uses. X X.SH PROGRAM OPERATION XThe program converts the input file to an output file using Xtransliteration rules from the transliteration rule file which Xyou specify with option \fB-t\fR. XSome examples of transliteration rule files are enclosed. XBefore program can be used, the transliteration rules need to be specified. X.PP XThese are given as a file which consist of the following parts Xdescribed below: X.br X.in +2m X.in +5m X.ti -5m X1) File format number (it is 1 at this moment) X.ti -5m X2) Delimiters used to enclose a) simple strings, b) character lists, Xc) regular expressions X.ti -5m X3) Starting sequence for output X.ti -5m X4) Ending sequence for output X.ti -5m X5) Number of input "character sets" X.ti -5m X6) SHIFT-OUT/SHIFT-IN sequences for each input character set X.ti -5m X7) Number of output "character sets" X.ti -5m X8) SHIFT-OUT/SHIFT-IN sequences for each output character set X.ti -5m X9) Transliteration table X.in -5m X.in -2m X.PP X\fIGENERAL COMMENTS\fR X.br XThe transliteration rules file consists of comments and data. XThe comments may be included in the file as: X.in +5m X.ti -2m Xa) line comments --- lines starting with ! or # character (# or ! must be Xin the first column of a line) are treated as comments and are not Xread in by the program. X.ti -2m Xb) comments following all required entries on the line. They must be Xseparated by at least one space from the last data entry on the line Xand need not start with any particular character. These comments cannot Xbe used within multiline sequences. X.br X.in -5m X.PP XThe data entries consist of integer numbers and strings. XThe strings may represent: X.br X a) plain strings X.br X b) character lists X.br X c) regular expressions X.br X.PP XAll strings which appear in the file, are processed through the X"string processor", which allows entering unprintable characters as codes. XThe character code is specified as a backslash "\|\\\|" followed by at least X2 digit(s) (i.e., \|\\\|01 produces code=1, but \|\|\\\|1 is passed unchanged). The Xfollowing formats are supported: X.br X \|\\\|0123 character of octal code 123 (when leading zero present) X.br X \|\\\|123 character of decimal code 123 (when leading digit is not zero) X.br X \|\\\|0o123 or \|\\\|0O123 character of octal code 123 X.br X \|\\\|0d123 or \|\\\|0D123 character of decimal code 123 X.br X \|\\\|0xA3 or \|\\\|0XA3 or \|\\\|0xa3 character of hexadecimal code A3 X.br X.PP XThe allowed digits are 0-7 for octal codes, 0-9 for decimal codes and X0-F (and/or 0-f) for hexadecimal codes. XIn a situation when code has to be followed by a digit character, Xyou need to enter the Xdigit as a code. E.g., if you want character \|\\\|0xA3 followed by a letter C, Xyou need to specify letter C as a code (\|\\\|0x43 or \|\\\|103 or \|\\\|0o103 or \|\\\|0d67) Xand type the sequence as, e.g., \|\\\|0xA3\|\\\|103. XCharacter resulting in a code 0 (zero) (e.g., \|\\\|00) is special. It tells: X"skip everything what follows me in this string". XIt does not make sense to use it, since you can always terminate the Xsequence with a delimiter. When you use an empty string as a matching Xsequence, remember that it does not match anything. X.sp XIf the line with entries is too long, you can break it between the Xfields. XIf the string is too long to fit a line, you can break it before any nonblank Xcharacter by the \|\\\| (backslash) followed by white space (i.e., new lines, Xspaces, tabs, etc.). The \|\\\| and the following white space will be removed Xfrom the string by the string preprocessor. However, you are not allowed Xto break the individual character codes (and you probably would not Xdo it ever for aestetic purposes). XFor example: X.br X "experi\\ X.br X mental design" X.br Xis equivalent to: X.br X "experimental design" X.br Xwhile: X.br X "experimental\\ X.br X design" X.br Xis equivalent to: X.br X "experimentaldesign" X.br XIf you need to have \|\\\| followed by a space in your string, you need to Xenter either a backslash or a space following it as an explicit character Xcode, for example: X.br X "\|\\\|\|\\\|0o40" X.br Xwill produce a \|\\\| followed by the space, while the string: X.br X "\|\\\| " X.br Xwill be empty. X.sp 1 XThe preprocessor knows only about comments, plain characters, character codes, Xand continuation lines. However, some characters and their combinations Xmay have a special meaning in lists and regular expressions. X.sp 2 X\fIDETAILS OF FILE STRUCTURE\fR X.sp X.PP X.in +3m X.ti -3m XAd.1) File format number. This is simply a digit 1 on a line by itself at the Xmoment. This entry is included to allow future extensions of the Xtransliteration description file without the need to modify older Xtransliteration descriptions (program will read data according to Xthe current file format number given in the file). X.sp X.ti -3m XAd.2) String delimiters. The subsequent 3 lines specify pairs of Xsingle character delimiters for 3 types of text data. XThe line format is: X.br X opening_character closing_character. X.br XThese are needed to mark the beginning/end and the type of the text data. XEach string (text datum) is saved starting from the first character after Xopening delimiter, and ends at the last character before the closing Xdelimiter. If you need to use the closing delimiter within a string, Xyou need to specify it as its code (e.g., if you are using () pair as Xdelimiters, specify ")" as \|\\\|0x29). The opening delimiter may be the same Xor different from the closing delimiter. X.sp X.in +2m X.ti -2m Xa) The first line contains characters used to enclose (bracket) Xa \fIplain string\fR. Plain strings are directly matched to input data or Xdirectly sent to output. XI suggest to stick to " " pair for plain strings. XThe ASCII code for " is \|\\\|0d34 = \|\\\|0x22 = \|\\\|0o42 if you need it inside the Xstring itself. X.sp X.ti -2m Xb) The second line contains characters to mark the beginning and the end Xof the \fIlist\fR. Lists are used to translate single character codes. XI suggest [ and ] delimiters for the list (ASCII code of "]" is: X\|\\\|0d93 = \|\\\|0x5D = \|\\\|0o135). The lists may include ranges, for example: X[a-zA-Z0-9] will include all Latin letters (small and capital) and digits. XNote that order is important: [a-d] is equivalent to [abcd], while X[d-a] will result in an error. If you want to include "-" (minus) in the Xlist, you need to place it as the first or the last character. There are only Xtwo special characters on the list, the "-" described above, and the "]" Xcharacter. You need to enter the "]" as its code. E.g., for XASCII character table [*--] is equivalent to [*+,-], is equivalent to X[\|\\\|42\|\\\|43\|\\\|44\|\\\|45]. The order of characters in the list does not matter Xunless the input list corresponds to the output list (this will be Xexplained later). Empty lists do not make sense. X.sp X.ti -2m Xc) The third line of delimiter specification contains delimiters for X\fIregular expression\fRs and \fIsubstitution expression\fRs. XThese strings are used for "flexible" matches Xto the text in the input file. They are very similar to the ones used in XUN*X for searching text in utilities like: grep, sed, vi, awk, etc., though Xonly a subset of full UN*X regular expression syntax is used here. XI suggest enclosing them within braces { and } (ASCII code for } is X\|\\\|0d125 = \|\\\|0x7D = \|\\\|0o175). Actually, regular expressions can only Xbe used for input sequences, and for output sequences the {} are Xused to enclose substitution sequences. This will be explained Xbelow. The description of the Xsyntax for regular/substitution expressions is Xadapted from the documentation for the regexp package of Henry XSpencer, University of Toronto --- this regular expression package Xwas incorporated, after minute modifications, into the program. X.br X.sp 2 X.ce X\fBREGULAR EXPRESSION SYNTAX\fR X.br XA regular expression is zero or more branches, separated by X`\|\||\|\|'. It matches anything that matches one of the branches. XThe `\|\||\|\|' simply means "or". X.ti +2m XA branch is zero or more pieces, concatenated. It matches a Xmatch for the first, followed by a match for the second, Xetc. X.ti +2m XA piece is an atom possibly followed by `*', `+', or `?'. XAn atom followed by `*' matches a sequence of 0 or more Xmatches of the atom. An atom followed by `+' matches a Xsequence of 1 or more matches of the atom. An atom followed Xby `?' matches zero or one occurrences of atom. X.ti +2m XAn atom is a regular expression in parentheses (matching a Xmatch for the regular expression), a range (see below), `.' X(matching any single character), a `\|\\\|' followed by Xa single character (matching that character), or a Xsingle character with no other significance (matching that Xcharacter). X.ti +2m XA range is a sequence of characters enclosed in `[\|\|]'. It Xnormally matches any single character from the sequence. If Xthe sequence begins with `^', it matches any single character Xnot from the rest of the sequence. If two characters in Xthe sequence are separated by `-', this is shorthand for the Xfull list of ASCII characters between them (e.g. `[0-9]' Xmatches any decimal digit). To include a literal `]' in the Xsequence, make it the first character (following a possible X`^'). To include a literal `-', make it the first or last Xcharacter. The regular expression can contains subexpressions Xwhich are enclosed in a (\|\|) pair. These subexpressions are numbered X1 to 9 and can be nested. The numbering of subexpressions is Xgiven in the order of their opening parentheses "(". For Xexample: X.br X.ta 6mL X (111)...(22(333)222(444)222)...(555) X.br XNote that expression 2 contains within itself expressions 3 and 4. X.br XThese subexpressions can be referenced in the substitution string which Xis described below in the paragraph below, or can be used to delimit Xatoms. X.in +2m XExamples: X.in +2m X.ti -2m X{[\|\\\|0d32\|\\\|0d09]\|\\\|0d10} --- will match space or tab followed by new line X.ti -2m X{[Tt][Ss]} --- will match TS, Ts, tS and ts X.ti -2m X{TS\|\||\|\|Ts\|\||\|\|tS\|\||\|\|ts} --- same as above X.ti -2m X{[\|\\\|0d09-\|\\\|0d15 ][^hH][^uU][a-zA-Z]*[\|\\\|0d09-\|\\\|0d15 ]} --- all words which Xdo not start with hu, Hu, hU, HU. There is a space between X\|\\\|0d15 and ]. X.br XNote that specifying expressions like {.*} (i.e., match all characters) Xdoes not make much sense, since it would mean here: match the whole input Xfile. However, expressions like {A.*B} should be acceptable, since they Xmatch a pair of A and B, and everything in between them, e.g. for a Xstring like: "This is Mr. Allen and this is Mr. Brown." this expression Xshould match the string: "Allen and this is Mr. B". X.br X.in -4m XRemember to put a backslash "\|\\\|" in front of the following Xcharacters: .\|\|[\|\|(\|\|)\|\||\|\|?\|\|+\|\|*\|\|\|\\\| if you want Xtheir literal meaning outside the Xrange enclosed in [\|\|]. Inside the range they have their literal meaning. XIf you know the syntax of UN*X regular expressions, please note that X\|\|^\|\| and \|$\| anchors are not supported and are treated as normal Xcharacters (with the exception of \|\|^\|\| negation within [\|\|]). X.sp X.ce X\fBSUBSTITUTION EXPRESSIONS\fR X.br XAfter finding a match for a regular expression in the input text, Xa substitution is made. XIt can be a simple substitution where the whole matching string Xis replaced by another string, or it may reuse a portion or Xthe whole matching string. The subexpressions (the ones enclosed Xin parentheses) within the regular Xexpression which matched the input text can be referenced in the Xsubstitution expression. XOnly the following characters have special meaning within substitution Xexpression: X.in +4m X.ta 3m X.br X.ti -2m X& --- will put the whole matching string. X.ti -2m X\|\\\|1 --- will put the match for the 1st subexpression in (\|\|). X.ti -2m X\|\\\|2 --- will put the string which matched 2nd subexpression, Xetc. X.ti -2m X\|\\\|9 --- will place in a replacement string the 9th Xsubexpression (provided that there was 9 (\|\|) pairs in Xthe regular expression) X.in -4m X.sp XOnly 9 subexpressions are allowed. XAll other characters and sequences within the substitution expression Xwill be placed in a substitution string as written. To be able to put Xa single backslash there, you need to put two of them. XTo be able to place the unchanged codes of the Xabove characters (i.e., to make them literals), you need to precede them Xwith a backslash "\|\\\|", i.e., to get & in the output string Xyou need to write it as \|\\\|&. Similarly, to place literal X\|\\\|1, \|\\\|2, etc., you need to enter it as \|\\\|\|\\\|1, \|\\\|\|\\\|2, etc. XNote that characters .+[]()^, etc. which had a special meaning in Xthe regular expressions, do not have any special meaning in the Xsubstitution expression and will be output as written. X.in +2m XExample: X.br XThe regular expression: X.in +2m X.ti -2m X{([Tt])([Ss])} and the corresponding substitution expression {\|\\\|1.\|\\\|2} Xputs a period Xbetween adjoining letters t and s preserving their letter case. X.br XThe expression: X.ti -2m X{([A-Za-z]+)-[ \|\\\|0x09]*([\|\\\|0x0A-\|\\\|0x0D]+)[ \|\\\|0x09]*([A-Za-z,.?;:"\|\\\|)'`!]+)[ \|\\\|0x09]} X.br Xand the substitution expression {\|\\\|1\|\\\|3\|\\\|2} dehyphenate words (when you Xunderstand this one, you are a guru...). For example: Xcon- (NL)cert is changed to concert(NL), where NL stands for New XLine. It looks for one or more letters (saves them as substring 1) Xfollowed by a hyphen (which may be followed by zero or more spaces Xor tabs). The hyphen must be followed by a NewLine (ASCII characters X0A-0D hex form various new line sequences) and saves NewLine sequence Xas a subexpression 2. XThen it looks for zero or more tabs and spaces (at the beginning of Xthe line). Then it looks for the rest of the hyphenated word and Xsaves it as substring 3. The word may have punctuation attached. XThen it looks again for some spaces or tabs. The substitution expression Xjunks all sequences which were not within (), i.e., hyphen and Xspaces/tabs and inserts only substrings but in a different Xorder. The \|\\\|1 (word beginning) is followed by \|\\\|3 (word end) and Xfollowed by the NewLine --- \|\\\|2. The {\|\\\|2\|\\\|1\|\\\|3} would Xbe probably equally good, though you would need to move the punctuation Xmatching to the beginning of the regular expression. X.in -6m X.ti -3m XAd.3) Starting sequence. This sequence will be sent to the output before Xany text. It is enclosed in the pair of string delimiters. I use it Xto output LaTeX preamble. However, it can be empty, if not used. XThe (sequence) may contain any characters, including new lines, etc. X.nf X.ta 2m 4m X Example: X "" # empty sequence X.sp X Example: X "\|\\\|documentstyle{article} X \|\\\|input cyracc X \|\\\|begin{document} X " X is right (note a new line at the end), but X.br X "\|\\\|documentstyle{article} X \|\\\|input cyracc # this comment will be included! X \|\\\|begin{document}" # while this will not X is wrong. X.sp X.fi X.ti -3m XAd.4) Ending sequence. Similar to 1), but will be appended at the end of the Xoutput file. X.nf X For example: X "\|\\\|end{document} X " X.fi X.sp X.ti -3m XAd.5) Number of input character sets. For example, in some incarnation of XKOI7, there are two character sets: Latin and Cyrillic. Cyrillic Xcharacter sequence follows SHIFT-OUT character (CTRL-N), \|\\\|0x0e, Xand is terminated by SHIFT-IN character (CTRL-O), \|\\\|0x0f. XAnother way of looking at it is that Latin characters follow XCTRL-O and cyrillic ones follow CTRL-N. X.sp XIf there is only one character set on input you should specify 0 Xas a number of input char sets, Xsince the input file obviously does not contain any SHIFT-OUT/IN Xsequences. X.sp X.ti -3m XAd.6) SHIFT-OUT/SHIFT-IN sequences for each input character set. XThese lines appear only if you specified nonzero number of character sets. XThese lines contain also "nesting sequences", which will be Xexplained later in this section. XYou do not use "nesting sequences" frequently, and let us assume Xfor a moment that nesting data are empty strings. XThe strings or regular expressions specified here are matched Xwith the contents of input text. If match was found, the matching sequence Xis usually deleted from the input text and: X.in +4m X.ti -2m Xa) for SHIFT-OUT sequence: the current input character set number is changed Xto the new one corresponding to the SHIFT-OUT sequence, or X.ti -2m Xb) for SHIFT-IN sequence: the previous input character set number is restored, X(i.e., the one which preceded the SHIFT-OUT sequence for the current set). XNote that only the SHIFT-IN sequence for the current set is matched. XThe SHIFT-IN sequences for other character sets than the current set are Xnot matched. XThe bracketing of sets is assumed Xperfect. If the SHIFT-IN sequence for the current set is an empty string, Xthe input set number is changed when SHIFT-OUT sequence of the new set Xis detected. X.in -4m XFor each input character set, you have to specify a line consisting Xof 6 strings/expressions separated by spaces: X.br X SO-match SO-subs NEST-up NEST-down SI-match SI-subs X.br Xwhere: X.br X.in +2m X.ti -2m XSO-match --- the string or regular expression for the SHIFT-OUT sequence Xfor the current character set. If detected, the input character set is Xchanged to this set. X.ti -2m XSO-subs --- this is usually an empty string (i.e., the input sequence Xmatching SO-match is removed). But it can be a replacement string or Xa substitution expression, which will substitute the original matching XSHIFT-OUT sequence. X.ti -2m XNEST-up --- this string (or a regular expression) is usually an empty Xstring). However, it can be used to count brackets for detection of SHIFT-IN Xbracket, if SHIFT-IN sequence is not unique. Its use is explained below. X.ti -2m XNEST-down --- a counterpart of NEST-up. It is explained later. X.ti -2m XSI-match --- when a sequence in an input file matches the string or regular Xexpression given as SI-match for a current input character set, the Xinput character set number is restored to the previous set. Note, that Xonly SI-match for a current set is matched with input characters. X.ti -2m XSI-subs --- this is usually an empty string (i.e., input sequence which Xmatched SI-match is removed), but if it is not, the input characters which Xmatched the SI-match are replaced with the SI-subs. X.sp X.in -2m X.br XThe KOI7 case described above may be specified as: X.nf X.ta 5m 10m 15m 20m 25m X.nf X 2 # 2 input sets X ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 # Latin(set 1) X "\|\\\|016" ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 "\|\\\|017" ""\0\0\0\0 # Cyrillic(set 2) X or X 2 # 2 sets X "\|\\\|017" ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 # Latin(set 1) X "\|\\\|016" ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 # Cyrillic(set 2) X.fi X.br XBefore the input is processed, the program is initialized to the character Xset of the first set. In the above case, it is important, since declaration: X.nf X 2 # 2 sets X "\|\\\|016" ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 # Cyrillic(set 1) X "\|\\\|017" ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 ""\0\0\0\0 # Latin(set 2) X.br X.fi Xwould be wrong and would mess up the Latin characters preceding Xfirst Cyrillic sequence. X.sp 1 XThe nesting sequences are used only for specific situations. I needed them Xto write a transliteration table from LaTeX to KOI8. XIn LaTeX the { } pair is used for grouping and appears frequently in Xthe text. The sequence of cyrillic characters is also a group Xin LaTeX. XThe SHIFT-OUT sequence for Russian letters in LaTeX is (at least in Xmy case): "{\|\\\|cyr ", and the end Xof the Russian letters is marked by "}", but the "}" has to be the Xbracket matching the opening "{" in "{\|\\\|cyr ", not just any bracket. XFor this reason, my SHIFT-OUT/IN entry was in this case: X.br X "{\|\\\|cyr " "" "{" "}" "}" "" # Cyrillic codes X.br XWhenever the "{\|\\\|cyr " was found, the program zeroes the counter. XIt adds +1 to it, when NEST-up sequence (i.e., the "{" here) is found, and Xsubtracts 1 from it, when the NEST-down sequence is found (i.e., the "}"). XThe checking for a SHIFT-IN sequence (i.e., the "}") for cyrillic set Xis done only when Xthe counter value is zero (i.e., all pairs inside the cyrillic text are Xmatched. In fact, the process is more Xcomplicated than that (the counter for an opened character set is Xplaced on the stack), but these are details you can find in the code Xitself. X.sp X.ti -3m XAd.7) Number of output "character sets". This is analogous to the input case. XThe characters sent to output may belong to different sets. For example, Xwhen the character (or the sequence) from set 2 is followed by the character X(or the sequence) from set 1, Xthe program first sends the SHIFT-IN sequence for set 2 (if it is not Xempty) and then the SHIFT-OUT sequence for set 1 (if it is not empty). If the Xoutput character (or sequence) is assigned to set 0, then no SHIFT-IN/SHIFT-OUT Xsequences are sent to output. X.br XIf there is only one set of output characters, you should specify 0. XNote that you may have several input sets and several output sets, though Xthis is rare. Usually, you have one input set and many Xoutput character sets, or vice versa. Again, if you have only one output set, Xyou do not have any SHIFT-IN/SHIFT-OUT sequences, since those are Xsend to output only when a set number is changed. XBut you are free to experiment. X.sp X.ti -3m XAd.8) SHIFT-OUT/SHIFT-IN sequences for each output character set. It is Xsimilar to the input case, however, the NEST-in and NEST-up sequences Xare not used here. Again, before any text is sent to output, the Xcharacter set specified as the first one is assumed. If SHIFT-OUT/IN Xsequences are not used (i.e., you have only one output character set), Xyou will not have any SHIFT-OUT/SHIFT-IN data lines. XThe KOI8 (single character set containing all Latin and Russian letters) Xto KOI7 (the set using overlapping codes switched by SHIFT-OUT/IN sequences) Xconversion could be therefore accomplished by the following table: X.br X 2 # 2 output sets X.br X ""\0\0\0\0 ""\0\0\0\0 # Latin Letters X.br X "\|\\\|016" "\|\\\|017" # Russian Letters Xcase X.sp X.ti -3m XAd.9) Transliteration table for individual character or their sequences. XIt is a core of your transliteration data. XThere are 4 columns in the transliteration Xtable: X.br X.in +3m X(inp_set_no) (inp_seq) (out_set_no) (out_seq) X.br X.in -3m XThese 4 columns are separated by spaces. The (input_set_number) Xcorresponds to the input character set number as specified above for Xinput SHIFT-OUT/SHIFT-IN data, or zero. XIf zero is used (even if number of input sets is not zero), the X(input_sequence) will be always matched, irrespectively of the current Xinput character set imposed by the SHIFT-OUT sequence. This is useful, Xsince some characters are universal (e.g., new lines, spaces, pluses, Xminuses, etc.) irrespectively of the current character set. XThe (input_sequence) is the sequence of characters to be matched with Xcharacters in the input file, and if found (within the character set Xspecified) it is replaced by the (output_sequence) and sent to output X(i.e., the matching is interrupted, the (output_sequence) sent to ouput, Xthe input file pointer is moved to the first character after the Xmatched sequence and matching resumes). XThe (output_set_number) specifies the output character set. When the Xoutput character set changes during transliteration, the appropriate SHIFT-IN Xsequence of the previous set and the current set's SHIFT-OUT sequence is sent Xto output. The (output_set_number) may also be zero (even if number of Xoutput sets is not zero). In this case, the current output set status Xis not changed, and no SHIFT-IN/OUT sequences is sent to output. Lastly, the Xoutput set code may be -1, -2 or -3. XIn this case, the substitution is performed Xwithin input string that matched but the output sequence is not sent to Xthe output yet. Depending on the code, the following action is performed: X.in +4m X.ti -2m X-1 --- program makes the substitution in the input string (i.e., substitutes Xthe matching string with the input string in the input buffer). XIt does not send the output sequence to the output, but Xcontinues matching input sequences following the currently Xmatched one. X.ti -2m X-2 --- like code -1, but matching is resumed from the first sequence on Xthe list. X.ti -2m X-3 --- like code -1, but matching is resumed from the input SHIFT-OUT/IN Xsequences. X.in -4m XE.g., if the unprocessed text in the input file is: X.br X mental procedure was not successful since.......... X.br Xand there was a line in transliteration table: X.br X 0 "me" -1 "you" X.br Xthe input text would be changed to: X.br X yountal procedure was not successful since.......... X.br Xand all remaining matching data would be applied to this text, rather than Xoriginal text. XThe -2 code backsteps to the point where the matching of Xtransliteration starts. XThe -3 code backsteps even further, to the point where the Xinput SHIFT-OUT and SHIFT-IN sequences are matched. XSince the order of sequences to match Xis crucial here, for the case of output set code -1/-2/-3 Xeven one-character input sequences are matched in the order specified. XBE CAREFUL HERE. You may create infinite loops. If you use Xcode -2/-3, be sure that the resulting sequence after substitution Xwith the code -2/-3, will not match previous sequences Xwith codes -2/-3. X.br XThe (output_sequence) Xis a sequence which substitutes the corresponding (input_sequence). XIf (output_sequence) is "" (i.e., empty string) then (input_sequence) Xis effectively deleted. XThe (input_sequence)s are compared with input in the order specified Xunless backstepping -2/-3 code is used (the matching is done from the Xfirst sequence again). I use the code -1 e.g., Xto dehyphenate words when changing to LaTeX. XCode -2 is useful if you want to skip next comparisons, and the resulting Xsubstitution string will match earlier matching expressions. XI do not see any use for the code -3, but you may have one. XThe order for multicharacter sequences is Xtherefore important (the single character sequences are always compared Xafter all multicharacter sequences, and can be therefore put anywhere). XThe longer multicharacter sequences should be specified before Xshorter ones, unless they are some "preprocessing" steps with codes X-1/-2/-3. The order may sometimes be crucial. XIf you need single character sequences matched in a specific order, Xenter them as regular expressions, i.e., as {c} instead of "c". XIn short, the multicharacter input sequences and regular expressions Xare matched to input text in the order specified. For the sake of Xefficiency, the single character input sequences (with exception of Xoutput set code -1/-2/-3) and input lists are handled as a case of remapping Xand are matched in the order of character codes associated with them. XIf you specify the same single input character twice for a given input set, Xthe program will complain. XThe following combinations of input and output sequences are allowed: X.nf X.ta 2m 24m X Input Sequence Output Sequence X "\fIplain string\fR" only "\fIplain string\fR" X [\fIlist\fR] [\fIlist\fR] or "\fIplain string\fR" X {\fIregular expression\fR} {\fIsubstitution expression\fR} or X.br X "\fIplain string\fR" X.br X.fi XWhen match is found, the matching sequence is removed and substituted Xwith an output sequence. If this results is changing the current output Xcharacter set, the appropriate SHIFT-IN/SHIFT-OUT pair is sent to the Xoutput before the transliterated output sequence. If list is Xused as the input sequence, you may either use: X.br X.in +2m X.ti -2m Xa) plain string as output Xsequence. In this case, if current input character belongs to the input list, Xit is replaced by the output string. I use it to delete ranges of Xcharacters which do not have any corresponding characters in the output Xset (e.g., some graphics characters). In this case, the order of Xcharacters on the input list is not important. X.ti -2m Xb) if the output string is also a Xlist then it has to contain exactly the same number of characters as Xthe input list. In this case, the 1st character from the input list Xis replaced by the 1st character from the output list, the 2nd one Xby the 2nd one, etc. Therefore, the order of characters is important. X.br X.in -2m XTheoretically, if there is one-to-one correspondence between characters Xin the input set and characters in the output set, Xyou can make the conversion by Xusing a single line consisting of two lists. But it looks ugly... And is Xdifficult to read. XAnd for the program, the substitution takes the same time, if Xthe characters are specified separately, or when they are specified Xas matching lists. XIf regular expression is used to match the input characters, the matching Xsequence may be replaced by a plain string or a substitution string, Xwhich was described above. X.in +3m XExamples: X.br X.ta 3m 10m 20m 30m 40m X 2 "CCCP" 0 ""\0\0\0\0 X.br Xwill delete all occurrences of CCCP from the input file (but not Cccp or XCCCp) for input set 2. X.sp 1 X 0 "\|\\\|0xD1" 0 "ya" X.br Xwill replace all occurrences of character of the code \|\\\|0xD1 with a two Xletter sequence "ya". X.sp 1 X 0 \|\\\|0xD1 2 q X.br Xwill replace all characters \|\\\|0xD1 with a character "q" and output XSHIFT-IN/OUT sequence if necessary. X.sp 1 X 2 "q" 0 "\|\\\|0xD1" X.br Xwill replace letter q (if the current input set is 2) with a code \|\\\|0xD1. X.sp 1 X 0 "\|\\\|0xD1" 2 "ya" X.br Xwill replace code \|\\\|0xD1 with a sequence ya (assuming that SHIFT-OUT Xand SHIFT-IN sequences Xfor output set 2 are: {\|\\\|cyr and }, respectively, you will get {\|\\\|cyr ya}). X.sp XIf a character is not specified in the transliteration table, it will Xbe output as is, i.e., it corresponds to a line: X.br X 0 "c" 0 "c" X.br Xwhere c is the character. If you want to delete certain characters, you Xneed to explicitly specify this, e.g.: X.br X 0 [a-z] 0 "" X.br Xwill delete all lower case Latin letters from the text. X.in -3m XBefore you decide to create your own transliteration file, please examine Xexisting transliteration files. Do yourself (and others) a favor --- put Xas many comments as possible there. If you allow others to use your Xtransliteration files, please include your name and e-mail address Xand file creation date. X.in -4m X.sp 2 XProgram matches the sequences in a specific order: X.in +4m X.ti -2m X\01) Match/substitute input SHIFT-OUT sequences X.ti -2m X\02) If matched, save current set and start new one X.ti -2m X\03) If matched, zero nest counter for NEST sequences X.ti -2m X\04) Match/substitute current set SHIFT-IN-sequence X.ti -2m X\05) If matched, restore previous set number X.ti -2m X\06) If matched, restore previous set nest counter X.ti -2m X\07) Match/substitute transliteration sequences X.ti -2m X\08) If matched and code = -1 make substitution in input buffer and Xcontinue matching the next sequence. X.ti -2m X\09) If matched and code = -2 make substitution and goto 7) X.ti -2m X10) If matched and code = -3 make substitution and goto 1) X.ti -2m X11) Match (no substitution) NEST-up and NEST-down to input buffer X.ti -2m X12) If NEST-up matched, increment counter for current set X.ti -2m X13) If NEST-down matched, decrement counter for current set X.ti -2m X14) If match in 7) send substitute sequence to output X.ti -2m X15) If no match in 7) (or code -1) output current input character X.ti -2m X16) Advance input pointer to point at new characters X.ti -2m X17) If End of File, break X.ti -2m X18) Goto 1) X.br X.fi X X.PP X.SH ASCII CHARACTER CODES X.nf X.ta 2m 6m 9m 13m 16m 20m 22m 26m 29m 33m 36m 40m X dec hx oct ch dec hx oct ch X X \0\00 00 000 ^@ NUL \064 40 100 @ X \0\01 01 001 ^A SOH \065 41 101 A X \0\02 02 002 ^B STX \066 42 102 B X \0\03 03 003 ^C ETX \067 43 103 C X \0\04 04 004 ^D EOT \068 44 104 D X \0\05 05 005 ^E ENQ \069 45 105 E X \0\06 06 006 ^F ACK \070 46 106 F X \0\07 07 007 ^G BEL \071 47 107 G X \0\08 08 010 ^H BS \072 48 110 H X \0\09 09 011 ^I HT \073 49 111 I X \010 0a 012 ^J LF \074 4a 112 J X \011 0b 013 ^K VT \075 4b 113 K X \012 0c 014 ^L FF \076 4c 114 L X \013 0d 015 ^M CR \077 4d 115 M X \014 0e 016 ^N SO \078 4e 116 N X \015 0f 017 ^O SI \079 4f 117 O X \016 10 020 ^P DLE \080 50 120 P X \017 11 021 ^Q DC1 \081 51 121 Q X \018 12 022 ^R DC2 \082 52 122 R X \019 13 023 ^S DC3 \083 53 123 S X \020 14 024 ^T DC4 \084 54 124 T X \021 15 025 ^U NAK \085 55 125 U X \022 16 026 ^V SYN \086 56 126 V X \023 17 027 ^W ETB \087 57 127 W X \024 18 030 ^X CAN \088 58 130 X X \025 19 031 ^Y EM \089 59 131 Y X \026 1a 032 ^Z SUB \090 5a 132 Z X \027 1b 033 ^[ ESC \091 5b 133 [ X \028 1c 034 ^\\ FS \092 5c 134 \\ X \029 1d 035 ^] GS \093 5d 135 ] X \030 1e 036 ^^ RS \094 5e 136 ^ X \031 1f 037 ^_ US \095 5f 137 _ X \032 20 040 SP \096 60 140 ` X \033 21 041 ! \097 61 141 a X \034 22 042 " \098 62 142 b X \035 23 043 # \099 63 143 c X \036 24 044 $ 100 64 144 d X \037 25 045 % 101 65 145 e X \038 26 046 & 102 66 146 f X \039 27 047 ' 103 67 147 g X \040 28 050 ( 104 68 150 h X \041 29 051 ) 105 69 151 i X \042 2a 052 * 106 6a 152 j X \043 2b 053 + 107 6b 153 k X \044 2c 054 , 108 6c 154 l X \045 2d 055 - 109 6d 155 m X \046 2e 056 . 110 6e 156 n X \047 2f 057 / 111 6f 157 o X \048 30 060 0 112 70 160 p X \049 31 061 1 113 71 161 q X \050 32 062 2 114 72 162 r X \051 33 063 3 115 73 163 s X \052 34 064 4 116 74 164 t X \053 35 065 5 117 75 165 u X \054 36 066 6 118 76 166 v X \055 37 067 7 119 77 167 w X \056 38 070 8 120 78 170 x X \057 39 071 9 121 79 171 y X \058 3a 072 : 122 7a 172 z X \059 3b 073 ; 123 7b 173 { X \060 3c 074 < 124 7c 174 | X \061 3d 075 = 125 7d 175 } X \062 3e 076 > 126 7e 176 ~ X \063 3f 077 ? 127 7f 177 DEL X X.br X X.SH CONVERSION: DECIMAL<-->OCTAL<-->HEX. X.nf X.cs R 24 X 000 000 00 064 100 40 128 200 80 192 300 C0 X 001 001 01 065 101 41 129 201 81 193 301 C1 X 002 002 02 066 102 42 130 202 82 194 302 C2 X 003 003 03 067 103 43 131 203 83 195 303 C3 X 004 004 04 068 104 44 132 204 84 196 304 C4 X 005 005 05 069 105 45 133 205 85 197 305 C5 X 006 006 06 070 106 46 134 206 86 198 306 C6 X 007 007 07 071 107 47 135 207 87 199 307 C7 X 008 010 08 072 110 48 136 210 88 200 310 C8 X 009 011 09 073 111 49 137 211 89 201 311 C9 X 010 012 0A 074 112 4A 138 212 8A 202 312 CA X 011 013 0B 075 113 4B 139 213 8B 203 313 CB X 012 014 0C 076 114 4C 140 214 8C 204 314 CC X 013 015 0D 077 115 4D 141 215 8D 205 315 CD X 014 016 0E 078 116 4E 142 216 8E 206 316 CE X 015 017 0F 079 117 4F 143 217 8F 207 317 CF X 016 020 10 080 120 50 144 220 90 208 320 D0 X 017 021 11 081 121 51 145 221 91 209 321 D1 X 018 022 12 082 122 52 146 222 92 210 322 D2 X 019 023 13 083 123 53 147 223 93 211 323 D3 X 020 024 14 084 124 54 148 224 94 212 324 D4 X 021 025 15 085 125 55 149 225 95 213 325 D5 X 022 026 16 086 126 56 150 226 96 214 326 D6 X 023 027 17 087 127 57 151 227 97 215 327 D7 X 024 030 18 088 130 58 152 230 98 216 330 D8 X 025 031 19 089 131 59 153 231 99 217 331 D9 X 026 032 1A 090 132 5A 154 232 9A 218 332 DA X 027 033 1B 091 133 5B 155 233 9B 219 333 DB X 028 034 1C 092 134 5C 156 234 9C 220 334 DC X 029 035 1D 093 135 5D 157 235 9D 221 335 DD X 030 036 1E 094 136 5E 158 236 9E 222 336 DE X 031 037 1F 095 137 5F 159 237 9F 223 337 DF X 032 040 20 096 140 60 160 240 A0 224 340 E0 X 033 041 21 097 141 61 161 241 A1 225 341 E1 X 034 042 22 098 142 62 162 242 A2 226 342 E2 X 035 043 23 099 143 63 163 243 A3 227 343 E3 X 036 044 24 100 144 64 164 244 A4 228 344 E4 X 037 045 25 101 145 65 165 245 A5 229 345 E5 X 038 046 26 102 146 66 166 246 A6 230 346 E6 X 039 047 27 103 147 67 167 247 A7 231 347 E7 X 040 050 28 104 150 68 168 250 A8 232 350 E8 X 041 051 29 105 151 69 169 251 A9 233 351 E9 X 042 052 2A 106 152 6A 170 252 AA 234 352 EA X 043 053 2B 107 153 6B 171 253 AB 235 353 EB X 044 054 2C 108 154 6C 172 254 AC 236 354 EC X 045 055 2D 109 155 6D 173 255 AD 237 355 ED X 046 056 2E 110 156 6E 174 256 AE 238 356 EE X 047 057 2F 111 157 6F 175 257 AF 239 357 EF X 048 060 30 112 160 70 176 260 B0 240 360 F0 X 049 061 31 113 161 71 177 261 B1 241 361 F1 X 050 062 32 114 162 72 178 262 B2 242 362 F2 X 051 063 33 115 163 73 179 263 B3 243 363 F3 X 052 064 34 116 164 74 180 264 B4 244 364 F4 X 053 065 35 117 165 75 181 265 B5 245 365 F5 X 054 066 36 118 166 76 182 266 B6 246 366 F6 X 055 067 37 119 167 77 183 267 B7 247 367 F7 X 056 070 38 120 170 78 184 270 B8 248 370 F8 X 057 071 39 121 171 79 185 271 B9 249 371 F9 X 058 072 3A 122 172 7A 186 272 BA 250 372 FA X 059 073 3B 123 173 7B 187 273 BB 251 373 FB X 060 074 3C 124 174 7C 188 274 BC 252 374 FC X 061 075 3D 125 175 7D 189 275 BD 253 375 FD X 062 076 3E 126 176 7E 190 276 BE 254 376 FE X 063 077 3F 127 177 7F 191 277 BF 255 377 FF X.cs R X.br X.sp X.fi X X.SH INSTALLATION XProgram is given in a source form. It was tried under UN*X, VMS and XMS-DOS systems and ran. The file \fIreadme.doc\fR contains the details Xon how to obtain the whole package. You can retrieve this file Xfrom anonymous ftp on kekule.osc.edu in the directory /pub/russian/translit. XYou can also obtain it via e-mail by sending a message: X.br X get translit/readme.doc from russian X.br Xto OSCPOST@osc.edu or OSCPOST@OHSTPY.BITNET. X.sp XThe source of the program consists of several files: X.br X.IP \fIpaths.h\fR Xmust be edited before compilation. It contains its Xown comments what to do. The defines in this file relate to the operating Xsystem you are using and the default path for searching transliteration Xtable. X.br X.IP \fItranslit.c\fR XIt contains the main program. XThis was intended to be a portable code. X.br X.IP \fIreg_exp.h\fR Xthe include file for regular expression matching Xlibrary of Henry Spencer from the University of Toronto. This regular Xexpression package was posted to comp.sources.misc (volume 3). Also 4 patches Xwere posted (in volumes: 3, 4, 4, 10). I applied the patches to the original Xcode and made small modifications to the code, which are marked in the Xsource code. X.br X.IP \fIreg_exp.c\fR Xthe regular expression library for compilation and Xmatching of regular expressions. X.br X.IP \fIreg_sub.c\fR Xthe regular expression substitution routine. X.br X.sp X.PP XBefore you compile this program you have to edit \fIpaths.h\fR. XRead comments in the file. XDuring compilation, all source code should reside in the Xcurrent directory. X.br XThen you may compile the program under UN*X as (for example): X.br X cc -o translit translit.c reg_exp.c reg_sub.c X.br Xand copy the program \fItranslit\fR to some standard directory which is Xin users' path (for example: /usr/local/bin). Then you need to copy Xtransliteration tables to the directory which you have chosen in \fIpaths.h\fR. XIf you get errors, then it is not OK. Please, report them to the author (with Xall the gory details: error message, line number, machine, operating system, Xetc.). X.sp XUnder VMS (VAXes) you need to compile it as: X.br X cc translit X.br X cc reg_exp X.br X cc reg_sub X.br X link translit+reg_exp+reg_sub,sys$library:vaxcrtl/lib X.br Xand before you can use the program, you need to type (or better put into your XLOGIN.COM file) a line: X.br X translit == "$SYS$USER:[ME.TRA]TRANSLIT.EXE" X.br Xor whatever is the full path to the \fItranslit\fR executable image which Xyou created with LINK. Note the quotes and the $ sign in front of program Xpath. X.sp XOn an IBM-PC I used MicroSoft C 5.1 as: X.br X.in +2m X.ti -1m Xcl /FeTRANSLIT /AL /FPc /W1 /F 5000 /Ox /Gs translit.c reg_exp.c reg_sub.c X.in -2m X.sp 2 X.SH RULES, CONDITIONS AND AUTHOR'S WHISHES XYou can distribute this code and associated files under these conditions: X.br X.in +4m X.ti -2m X 1) You will distribute all files (even if you Xthink that they are garbage). You may get the complete set from anonymous Xftp at kekule.osc.edu in /pub/russian/translit. You can also get the program Xand associated files via e-mail. To get the instructions for e-mail Xdistribution send a line: X.br X send translit/readme.doc from russian X.br Xto OSCPOST@osc.edu or OSCPOST@OHSTPY.BITNET. XYou are not allowed to distribute the incomplete distribution. The following Xfiles should be present in the distribution: X.ta 2m 22n X.nf X alt-gos.rus - ALT to GOSTCII table X alt-koi8.rus - ALT to KOI8 table X example.alt.uu - uuencoded example in ALT X example.ko8.uu - uuencoded example in KOI8 X example.pho - phonetic transliteration example X example.tex - LaTeX example X gos-alt.rus - GOSTCII to ALT table X gos-koi8.rus - GOSTCII to KOI8 table X koi7-8.rus - KOI7 to KOI8 table X koi7nl-8.rus - KOI7 (no Latin) to KOI8 table X koi8-7.rus - KOI8 to KOI7 table X koi8-alt.rus - KOI8 to ALT table X koi8-gos.rus - KOI8 to GOSTCII table X koi8-lc.rus - KOI8 to Library of Congress table X koi8-phg.rus - KOI8 to GOST transliteration X koi8-php.rus - KOI8 to Pokrovsky transliteration X koi8-tex.rus - KOI8 to LaTeX conversion X order.txt - Order form for ordering the program X paths.h - Include file for translit.c X phg-koi8.rus - GOST transliteration to KOI8 X pho-8sim.rus - Simple phonetic to KOI8 X pho-koi8.rus - Various phonetic to KOI8 X php-koi8.rus - Pokrovsky to KOI8 X readme.doc - short description of the files X reg_exp.c - regular expression code by Henry Spencer X reg_exp.h - include for reg_exp.c and reg_sub.c X reg_sub.c - regular expression code by H. Spencer X tex-koi8.rus - LaTeX to KOI8 X translit.c - TRANSLIT main program X translit.ps - TRANSLIT manual in PostScript X translit.1 - TRANSLIT manual in *roff X translit.txt - Plain ASCII TRANSLIT manual X.sp 1 X.fi X.ti -2m X 2) You may expand/change the files and the program and distribute modified Xfiles, provided that you do Xnot delete anything (you can always comment the unnecessary portions out) Xand clearly mark your changes. Please send the copy of the modified Xversion to the author, though you are not required to do so. XI will give you all the credit for your enhancements. I simply wish that Xthere is a single point of distribution for this code, so it is maintained Xto some extent. If you create additional transliteration definition files, Xplease, send them to the author if you may. I will add them to the program Xdistribution. I want to fix bugs and expand/optimize this code, Xbut I need your help. XI need your transliteration files for languages which I do not know or Xdo not use currently. XYour suggestions for improving documentation are most welcome (I am not Xa native English speaker). X.ti -2m X3) You will not charge money for the program and/or associated files, Xexcept for media and copying costs. If you want to sell it, contact the author Xfirst. Bear in mind Xthat the regular expression package by Henry Spencer has some Xcopyright restrictions. XBut there are other regular expression packages which do not have these Xrestrictions (which are not violated by this offering). X.ti -2m X4) I will gladly help you with advice on compiling this software and Xtry to fix bugs when time allows. However, if you want a ready to run Xexecutable, you need to order it for a very nominal fee from X\fIJKL ENTERPRISES, INC.\fR as described in the file \fIorder.txt\fR Xwhich must be a part of a complete distribution. X.in -4m X X.SH AUTHOR XJan Labanowski, P.O. Box 21821, Columbus, OH 43221-0821, USA. XE-mail: jkl@osc.edu, JKL@OHSTPY.BITNET. X END_OF_FILE if test 56776 -ne `wc -c <'translit.1'`; then echo shar: \"'translit.1'\" unpacked with wrong size! fi # end of 'translit.1' fi echo shar: End of archive 1 $of 10$. cp /dev/null ark1isdone MISSING="" for I in 1 2 3 4 5 6 7 8 9 10 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have unpacked all 10 archives. rm -f ark[1-9]isdone ark[1-9][0-9]isdone else echo You still must unpack the following archives: echo " " ${MISSING} fi exit 0 exit 0 # Just in case...