Introduction. The \.{POOLtype utility program converts string pool files output by \.{TANGLE into a slightly more symbolic format that may be useful when \.{TANGLEd programs are being debugged.

It’s a pretty trivial routine, but people may want to try transporting this program before they get up enough courage to tackle \TeX\ itself. The first 256 strings are treated as \TeX\ treats them, using routines copied from \TeX82.

 \.{POOLtype is written entirely in standard \PASCAL, except that it has to do some slightly system-dependent character code conversion on input and output. The input is read from |pool_file|, and the output is written on |output|. If the input is erroneous, the |output| file will describe the error. ŝystem dependencies@>

@p program POOLtype(!pool_file,!output); label 9999; {this labels the end of the program type @<Types in the outer block@> var @<Globals in the outer block@> procedure initialize; {this procedure gets things started properly var @<Local variables for initialization@>@; begin @<Set initial values of key variables@> end;

 Here are some macros for common programming idioms.

@d incr(#) == #:=#+1 {increase a variable by unity @d decr(#) == #:=#-1 {decrease a variable by unity @d do_nothing == {empty statement


The character set. (The following material is copied verbatim from \TeX82. Thus, the same system-dependent changes should be made to both programs.)

In order to make \TeX\ readily portable between a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the “American Standard Code for Information Interchange.” This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user’s external representation just before they are output to a text file.

Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character ‘\.A’ has ASCII code $65=1'01$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has ‘\.A’ in a different position, \TeX\ doesn’t know what the real position is; the program that does the actual printing from \TeX’s device-independent files is responsible for converting from ASCII to a particular font encoding. ÂSCII code@>

\TeX’s internal code is relevant also with respect to constants that begin with a reverse apostrophe; and it provides an index to the \.{\\catcode, \.{\\mathcode, \.{\\uccode, \.{\\lccode, and \.{\\delcode tables.

 Characters of text that have been converted to \TeX’s internal form are said to be of type |ASCII_code|, which is a subrange of the integers.

@<Types...@>= !ASCII_code=0..255; {eight-bit numbers

 The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes 4'0 through 1'76; all of these characters are now available on most computer terminals.

Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name.

In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. ŝystem dependencies@>

@d text_char == char {the data type of characters in text files @d first_text_char=0 {ordinal number of the smallest element of |text_char| @d last_text_char=255 {ordinal number of the largest element of |text_char|

@<Local variables for init...@>= !i:integer;

 The \TeX\ processor converts between ASCII code and the user’s external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL’s |ord| and |chr| functions.

@<Glob...@>= !xord: array [text_char] of ASCII_code; {specifies conversion of input characters !xchr: array [ASCII_code] of text_char; {specifies conversion of output characters

 Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. ŝystem dependencies@>

@<Set init...@>= xchr[4'0]:=’ ’; xchr[4'1]:=’!’; xchr[4'2]:=’"’; xchr[4'3]:=’#’; xchr[4'4]:=’$’; xchr[4'5]:=’%’; xchr[4'6]:=’&’; xchr[4'7]:=””; xchr[5'0]:=’(’; xchr[5'1]:=’)’; xchr[5'2]:=’*’; xchr[5'3]:=’+’; xchr[5'4]:=’,’; xchr[5'5]:=’-’; xchr[5'6]:=’.’; xchr[5'7]:=’/’; xchr[6'0]:=’0’; xchr[6'1]:=’1’; xchr[6'2]:=’2’; xchr[6'3]:=’3’; xchr[6'4]:=’4’; xchr[6'5]:=’5’; xchr[6'6]:=’6’; xchr[6'7]:=’7’; xchr[7'0]:=’8’; xchr[7'1]:=’9’; xchr[7'2]:=’:’; xchr[7'3]:=’;’; xchr[7'4]:=’<’; xchr[7'5]:=’=’; xchr[7'6]:=’>’; xchr[7'7]:=’?’; xchr[1'00]:=’@’; xchr[1'01]:=’A’; xchr[1'02]:=’B’; xchr[1'03]:=’C’; xchr[1'04]:=’D’; xchr[1'05]:=’E’; xchr[1'06]:=’F’; xchr[1'07]:=’G’; xchr[1'10]:=’H’; xchr[1'11]:=’I’; xchr[1'12]:=’J’; xchr[1'13]:=’K’; xchr[1'14]:=’L’; xchr[1'15]:=’M’; xchr[1'16]:=’N’; xchr[1'17]:=’O’; xchr[1'20]:=’P’; xchr[1'21]:=’Q’; xchr[1'22]:=’R’; xchr[1'23]:=’S’; xchr[1'24]:=’T’; xchr[1'25]:=’U’; xchr[1'26]:=’V’; xchr[1'27]:=’W’; xchr[1'30]:=’X’; xchr[1'31]:=’Y’; xchr[1'32]:=’Z’; xchr[1'33]:=’[’; xchr[1'34]:=’\’; xchr[1'35]:=’]’; xchr[1'36]:=’^’; xchr[1'37]:=’_’; xchr[1'40]:=’‘’; xchr[1'41]:=’a’; xchr[1'42]:=’b’; xchr[1'43]:=’c’; xchr[1'44]:=’d’; xchr[1'45]:=’e’; xchr[1'46]:=’f’; xchr[1'47]:=’g’; xchr[1'50]:=’h’; xchr[1'51]:=’i’; xchr[1'52]:=’j’; xchr[1'53]:=’k’; xchr[1'54]:=’l’; xchr[1'55]:=’m’; xchr[1'56]:=’n’; xchr[1'57]:=’o’; xchr[1'60]:=’p’; xchr[1'61]:=’q’; xchr[1'62]:=’r’; xchr[1'63]:=’s’; xchr[1'64]:=’t’; xchr[1'65]:=’u’; xchr[1'66]:=’v’; xchr[1'67]:=’w’; xchr[1'70]:=’x’; xchr[1'71]:=’y’; xchr[1'72]:=’z’; xchr[1'73]:=’{’; xchr[1'74]:=’|’; xchr[1'75]:=’’; xchr[1'76]:=’~’;

 Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning.

@d null_code=0' {ASCII code that might disappear @d carriage_return=1'5 {ASCII code used at end of line @d invalid_code=1'77 {ASCII code that many systems prohibit in text files

 The ASCII code is “standard” only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/ gives a complete specification of the intended correspondence between characters and \TeX’s internal representation. TeXbook{\sl The \TeX book@>

If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn’t really matter what codes are specified in |xchr[0..3'7]|, but the safest policy is to blank everything out by using the code shown below.

However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like ‘\.^^Z’ instead of ‘\.{\\ne’. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~4'0. To get the most “permissive” character set, change |’ ’| on the right of these assignment statements to |chr(i)|. ĉharacter set dependencies@> ŝystem dependencies@>

@<Set init...@>= for i:=0 to 3'7 do xchr[i]:=’ ’; for i:=1'77 to 3'77 do xchr[i]:=’ ’;

 The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i<j<1'77|, the value of |xord[xchr[i]]| will turn out to be |j| or more; hence, standard ASCII code numbers will be used instead of codes below 4'0 in case there is a coincidence.

@<Set init...@>= for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; for i:=2'00 to 3'77 do xord[xchr[i]]:=i; for i:=0 to 1'76 do xord[xchr[i]]:=i;


String handling. (The following material is copied from the \\{get\_strings\_started procedure of \TeX82, with slight changes.)

@<Glob...@>= !k,!l:0..255; {small indices or counters !m,!n:text_char; {characters input from |pool_file| !s:integer; {number of strings treated so far

 The global variable |count| keeps track of the total number of characters in strings.

@<Glob...@>= !count:integer; {how long the string pool is, so far

 @<Set init...@>= count:=0;

 This is the main program, where \.{POOLtype starts and ends.

@d abort(#)==begin write_ln(#); goto 9999; end

@p begin initialize; @<Make the first 256 strings@>; s:=256; @<Read the other strings from the \.{POOL file, or give an error message and abort@>; write_ln(’(’,count:1,’ characters in all.)’); 9999:end.

 @d lc_hex(#)==l:=#; if l<10 then l:=l+"0" @+else l:=l-10+"a"

@<Make the first 256...@>= for k:=0 to 255 do begin write(k:3,’: "’); l:=k; if (@<Character |k| cannot be printed@>) then begin write(xchr["^"],xchr["^"]); if k<1'00 then l:=k+1'00 else if k<2'00 then l:=k-1'00 else begin lc_hex(k div 16); write(xchr[l]); lc_hex(k mod 16); incr(count); end; count:=count+2; end; if l="""" then write(xchr[l],xchr[l]) else write(xchr[l]); incr(count); write_ln(’"’); end

 The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like ‘\.{\^\^A’ unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[3'2]=@t\.{\’^^Z\’@>|, would like string 3'2 to be the single character 3'2 instead of the three characters 1'36, 1'36, 1'32 (\.{\^\^Z). On the other hand, even people with an extended character set will want to represent string 1'5 by \.{\^\^M, since 1'5 is |carriage_return|; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files.

Unprintable characters of codes 128–255 are, similarly, rendered \.{\^\^80–\.{\^\^ff.

The boolean expression defined here should be |true| unless \TeX\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in {\sl The \TeX book\/ would, for example, be ‘|k in [0,1'0..1'2,1'4,1'5,3'3,1'77..3'77]|’. If character |k| cannot be printed, and |k<2'00|, then character |k+1'00| or |k-1'00| must be printable; moreover, ASCII codes |[4'1..4'6, 6'0..7'1, 1'41..1'46, 1'60..1'71]| must be printable. Thus, at least 80 printable characters are needed. TeXbook{\sl The \TeX book@> ĉharacter set dependencies@> ŝystem dependencies@>

@<Character |k| cannot be printed@>= (k<" ")or(k>"~")

 When the \.{WEB system program called \.{TANGLE processes a source file, it outputs a \PASCAL\ program and also a string pool file. The present program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is output with its associated index number. The strings are surrounded by double-quote marks; double-quotes in the string itself are repeated.

@<Glob...@>= !pool_file:packed file of text_char; {the string-pool file output by \.{TANGLE !xsum:boolean; {has the check sum been found?

 @<Read the other strings...@>= reset(pool_file); xsum:=false; if eof(pool_file) then abort(’! I can”t read the POOL file.’); repeat @<Read one string, but abort if there are problems@>; until xsum; if not eof(pool_file) then abort(’! There”s junk after the check sum’)

 @<Read one string...@>= if eof(pool_file) then abort(’! POOL file contained no check sum’); read(pool_file,m,n); {read two digits of string length if m<>’*’ then begin if (xord[m]<"0")or(xord[m]>"9")or(xord[n]<"0")or(xord[n]>"9") then abort(’! POOL line doesn”t begin with two digits’); l:=xord[m]*10+xord[n]-"0"*11; {compute the length write(s:3,’: "’); count:=count+l; for k:=1 to l do begin if eoln(pool_file) then begin write_ln(’"’); abort(’! That POOL line was too short’); end; read(pool_file,m); write(xchr[xord[m]]); if xord[m]="""" then write(xchr[""""]); end; write_ln(’"’); incr(s); end else xsum:=true; read_ln(pool_file)


System-dependent changes. This section should be replaced, if necessary, by changes to the program that are necessary to make \.{POOLtype work at a particular installation. It is usually best to design your change file so that all changes to previous sections preserve the section numbering; then everybody’s version will be consistent with the printed program. More extensive changes, which introduce new sections, can be inserted here; then only the index itself will get a new section number. ŝystem dependencies@>


Index. Indications of system dependencies appear here together with the section numbers where each ident\-i\-fier is used.


This document was generated on December 8, 2024 using texi2html 5.0.