home *** CD-ROM | disk | FTP | other *** search
Text File | 1988-02-06 | 56.2 KB | 1,830 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- _P_T_C _i_m_p_l_e_m_e_n_t_a_t_i_o_n _n_o_t_e
-
- by
-
- Per Bergsten
-
- Holistic Technology AB
- Grona Gatan 59
- 414 54 Gothenburg
- Sweden
-
-
-
-
-
-
-
-
-
-
-
-
-
- This note describes the implementation of ptc, a Pascal to C
- translator. The program was developed by Per Bergsten of Holis-
- tic Technology AB, Gothenburg, Sweden. The paper is intended to
- provide a guide for those who need to transport ptc to a new
- environment, it describes how Pascal constructs are mapped onto C
- constructs.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _1. _B_a_c_k_g_r_o_u_n_d
-
- The aim was originally to provide a simple tool for transporting
- finished applications to systems lacking Pascal compilers. The
- first versions, constructed in about 1984, were capable of
- translating simple Pascal programs. It was never intended to
- become a released product, however, as time went by, programs and
- ambitions grew to the point where nearly the full Pascal language
- was supported. Thus the program as it stands today has a long
- ancestry but it has never been redesigned (which it should have
- been).
-
- _2. _P_a_s_c_a_l _v_s _C
-
- To anyone familiar with the two languages it is obvious that they
- are very similar in structure. The major features may be summar-
- ised as follows:
-
- Pascal C
-
- Block-structured Block-structured
- - multiple declaration levels - single declaration level
- Statically scoped Statically scoped
- Strongly typed Weakly typed
- - allows unbounded pointers
- Call by value Mostly call by value
- - and call by reference
- Highly independent Highly integrated
- - of host system - with system
- Self contained Allows external definitions.
-
-
- On the syntactic level the languages differ only in minor ways,
- mostly in the order in which keywords and other symbols occur,
- and of course in that the languages uses different symbols for
- the same purposes. The only complication is that C prohibits
- nested subroutine declarations.
-
- On the semantic level the situation is worse. C has the pecu-
- liarity that array variables are treated differently from other
- variables, this forces us to adopt some general way to handle
- array variables. Furthermore, since Pascal offers nested subrou-
- tine declarations it becomes necessary to simulate part of the
- activation record mechanism in the translated code, in one case
- it is extremely difficult to completely do this. It is also
- clear that the C typedef mechanism has some shortcomings that
- preclude an easy translation of Pascal types.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 1 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _3. _M_a_p_p_i_n_g _P_a_s_c_a_l _t_o _C
-
- In this part of the paper we briefly illustrate how to translate
- Pascal code into equivalent C code.
-
- _3._1. _P_r_o_g_r_a_m_s
-
- A minimal Pascal program:
-
- program p;
- begin
- end.
-
- translates to the C equivalent:
-
- extern void exit();
- main()
- {
- exit(0);
- }
-
-
- It should be noted here that the translator does not actually
- require a complete Pascal program, the implementation is such
- that any consistent set of declarations can be translated.
-
- _3._2. _L_e_x_i_c_a_l _i_s_s_u_e_s
-
- The C language uses ASCII as a carrier, almost all of the availi-
- ble characters are used. The Pascal definition uses a smaller
- set of characters. Since few features of the languages depend on
- the underlying character set this does not introduce much diffi-
- culties.
-
- One serious problem does occur. Both language definitions states
- that comments have no meaning and no clear place in the language
- syntax. Furthermore, the Pascal definition states that a comment
- is equivalent to a blank character. This implies that if com-
- ments are handled accurately the translator should also be able
- to collect and classify each blank character in a Pascal program
- and to generate a C program with the same number of blank charac-
- ters in the corresponding positions. This implication conflicts
- with the fact that the languages have different syntax rules, it
- is not obvious what the "corresponding positions" would be.
-
- Since comments have no defined meaning a user is free to inter-
- pret them in any way and, in particular, to associate them with
- the surrounding code in any way s/he chooses. Although humans
- usually are able to deduce what bearing a comment has on the sur-
- rounding program code there are no formal rules for how to do
- this. Therefore there is no _a _p_r_i_o_r_i correct way to translate
- comments and the translator described here ignores comments alto-
- gether. If/when a reasonable _a_d _h_o_c solution is found that
- feature may be incorporated.
-
- _3._3. _D_e_c_l_a_r_a_t_i_o_n_s
-
- The program may introduce local declarations which are handled as
- follows.
-
-
-
-
- - 2 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _3._3._1. _L_a_b_e_l_s
-
-
- program p;
-
- label 9;
-
- begin
- 9:
- goto 9
- end.
-
- which we simply translate into:
-
- extern void exit();
- main()
- {
- L9:
- goto L9;
- exit(0);
- }
-
-
- If the label is reached from an inner procedure:
-
- program p;
-
- label 100;
-
- procedure q;
-
- begin
- goto 100
- end;
-
- begin
- 100:
- end.
-
- a more complicated translation must be used:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 3 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- # define Line __LINE__
- void Caseerror();
- # include <setjmp.h>
- static struct Jb { jmp_buf jb; } J[1];
-
- void
- q()
- {
- longjmp(J[0].jb, 100);
- }
-
- main()
- {
- if (setjmp(J[0].jb))
- goto L100;
- L100:
- exit(0);
- }
-
-
- We assume the existence of the standard _s_e_t_j_m_p() and _l_o_n_g_j_m_p()
- library functions. Jumpbuffers are statically allocated as
- needed depending on the number of declarationlevels in the pro-
- gram.
-
- _3._3._2. _C_o_n_s_t_a_n_t_s
-
- Constant declarations are treated in two different ways. Numbers
- aliases etc are simply # _d_e_f_i_n_e'd but string constants are con-
- verted to static character arrays in order to avoid unnecessary
- duplication of string-constants in the object code, thus:
-
- const
- p = 3.14;
- pie = '3.14';
-
- translates to:
-
- # define pi 3.14
- static char pie[] = "3.14";
-
-
- _3._3._3. _T_y_p_e_s _a_n_d _v_a_r_i_a_b_l_e_s
-
- Types and variables are mostly easy to translate:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 4 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- program p;
-
- const length = 15;
-
- type
- struct = 0 .. length;
- vect = array [ struct ] of struct;
- color = (red, blue, ada, yellow);
- pointer = ^ recd;
- recd = record
- r : pointer;
- case b : boolean of
- false: (c : color);
- true: (v : vect);
- end;
-
- var SP : pointer;
-
- begin
- new(SP, true);
- end.
-
- becomes
-
- typedef char boolean;
- # define false (boolean)0
- # define true (boolean)1
- extern char *Bools[];
- # define Unionoffs(p, m) (((long)(&(p)->m))-((long)(p)))
- extern char *malloc();
- # define length 15
- typedef unsigned char C47_struct;
- typedef struct { C47_struct A[length + 1]; } vect;
- typedef enum { red, blue, ada, yellow } color;
- typedef struct S57 *pointer;
- typedef struct S57 {
- pointer r;
- boolean b;
- union {
- struct {
- color c;
- } V1;
- struct {
- vect v;
- } V2;
- } U;
- } recd;
- pointer sp;
-
- main()
- {
- sp = (struct S57 *)malloc((unsigned)
- (Unionoffs(sp, U.V2.v) + sizeof(sp->U.V2)));
- exit(0);
- }
-
- The rationale is as follows:
-
-
-
- - 5 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- Identifiers in the Pascal program which conflicts with reserved
- words in C are renamed by adding a unique prefix Cnnn where nnn
- is a number.
-
- We also note here that uppercase letters in identifiers and key-
- words are translated to lowercase. Pascal specifies that
- upper/lower case is insignificant whereas C (for the present)
- separates the two. This fact is used to advantage by the trans-
- lator as all subroutines and macros defined by the translator
- have an initial uppercase letter which prevents confusion.
-
- - The type _b_o_o_l_e_a_n is a predefined Pascal type, when it is
- used the translator emits code which defines boolean to be
- the smallest convenient type: _c_h_a_r. The constants _f_a_l_s_e and
- _t_r_u_e are defined and the vector _B_o_o_l_s will contain text-
- strings for output if needed.
-
- - The predefined types _i_n_t_e_g_e_r and _r_e_a_l are likewise mapped
- directly onto the standard C types _i_n_t and _d_o_u_b_l_e through a
- typedef declaration.
-
- Integer subranges are mapped onto standard C arithmetic
- types according to a short table in the translator. The
- table is scanned from top to bottom until an enclosing range
- is found and the corresponding type-name is emitted.
-
- - C-arrays have peculiar semantix. To unify the treatment of
- arrays and other datatypes we always encapsulate an array in
- a struct, thus an array always becomes a _s_t_r_u_c_t with a sin-
- gle member named A.
-
- - Records and their variants are translated into C _s_t_r_u_c_t and
- _u_n_i_o_n definitions. Since C requires all union members to
- have a name we must supply artificial names for all record
- variants. A record with variants will therefore always con-
- tain one field with the name U which have sub-fields named
- Vnnn where nnn is a positive number.
-
- When allocating dynamic storage for a record with variants
- naming the desired variant
-
- new(sp, true)
-
- we face the problem of determining the amount of memory
- needed.
-
- _C _d_o_e_s _n_o_t _p_r_o_v_i_d_e _a _s_a_f_e _w_a_y _t_o _c_o_m_p_u_t_e _t_h_e _s_i_z_e _o_f _a
- _p_a_r_t_i_c_u_l_a_r _s_t_r_u_c_t _v_a_r_i_a_n_t.
-
- The strategy adopted to cope with this problem is to attempt
- to compute the offset of a fieldmember in the variant match-
- ing the constant and then to add the size of the variant.
- The offset computation is expressed as a macro, _U_n_i_o_n_o_f_f_s,
- which uses rather foul typecasting to achive the result.
- The only availible alternative would be to use the same size
- of all variants. This method, while being safe, wastes
- memory when many small and few large records of the same
- type are created dynamically.
-
-
-
- - 6 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- - Pascal enumeration types are converted directly to C _e_n_u_m
- types.
-
- - Pascal pointer types are translated into C pointer types.
- Pascal allows the programmer to construct recursive types as
- pointer types need not be fully defined until the end of the
- declaration-section where the pointer type is used. In
- practice this is only used to introduce record types which
- contain pointers to themselves. This problem is partially
- solved by introducing a name for the record type. Hence
-
- type
- ptr = ^ node;
- node = record
- next : ptr;
- info : integer
- end;
-
- becomes
-
- typedef struct S56 * ptr;
- typedef struct S56 {
- ptr next;
- integer info;
- } node;
-
- we note in passing that the problem cannot be completely
- solved since
-
- type pureptr = ^ pureptr;
-
- which is valid Pascal, cannot be expressed in C.
-
- - A pascal set-type does not have any direct counterpart in C.
- The C language does however have a adequate set of operators
- for bit manipulation. We use these to implement a Pascal
- set as an array of _s_e_t_w_o_r_d. So:
-
- type
- s = set of 1 .. 100;
-
- var
- ss : s;
-
- is translated into:
-
- typedef unsigned short setword;
- typedef struct { setword S[8]; } s;
-
- s ss;
-
- The situation is slightly complicated by the fact that Pas-
- cal has a set constructor which permits the construction of
- arbitrary large sets, for example:
-
- s := [ 50 .. maxint ] * [ 1 .. 80 ]
-
- for that reason the first member in the array of words gives
- size of the set (in words). In the example above s.S[0]
-
-
-
-
- - 7 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- would have the value 7, and s.S[1] through s.S[7] would hold
- the bits. The number 7 is computed on the assumption that
- the type _u_n_s_i_g_n_e_d _s_h_o_r_t on the target host is sufficiently
- large to holds 16 bits. The set operators of Pascal are
- implemented as C functions returning pointers to arrays of
- setwords, the intermediary results are held in a static area
- of fixed size.
-
- - Pascal files are implemented using the standard i/o package
- availible in most C implementations. Since Pascal has the
- requirement that the next element of a file is visible in
- the filebuffer before it is read, and the requirement that
- linemarkers in textfiles are given special treatement we are
- forced to extend the _F_I_L_E type provided in <_s_t_d_i_o._h>.
- Hence:
-
- var f : text;
-
- becomes
-
- typedef struct {
- FILE *fp;
- unsigned short
- eoln:1,
- eof:1,
- init:1,
- out:1,
- :12;
- char buf;
- } text;
- text f;
-
- where _b_u_f is our filebuffer and _e_o_l_n, _e_o_f and _i_n_i_t are flags
- giving the state of the file. All Pascal i/o operations are
- implemented using macros that maintain the flags and the
- buffer variable. The actual reading and writing of data is
- deferred to the standard i/o package.
-
- _3._3._4. _P_r_o_c_e_d_u_r_e_s _a_n_d _f_u_n_c_t_i_o_n_s
-
- Pascal procedures and functions are somewhat difficult to
- translate to C. The main problems lie in nested declarations and
- procedural parameters. Nested declarations are handled in the
- following manner:
-
- - If procedure B is declared in procedure A, then pro-
- cedure B is emitted before A and A is forward declared
- before B.
-
- - Any constants and types declared in A is moved to the
- global scope, this may force renaming of those objects.
-
- - Any variable declared in A _a_n_d _u_s_e_d _i_n _B is converted
- to a pointer and moved to the global scope. The
- pointer value is saved and re-initialized upon each
- entry of A and restored upon exit from A.
-
-
-
-
-
- - 8 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- Hence:
-
- procedure A;
-
- const limit = 20;
-
- type loctyp = 0 .. limit;
-
- var i, j : loctyp;
-
- procedure B(k : loctyp);
-
- begin
- j := k + 2
- end;
-
- begin
- B(i)
- end;
-
- becomes
-
- typedef unsigned char loctyp;
- loctyp *G56_j;
-
- void a();
-
- void
- b(k)
- loctyp k;
- {
- (*G56_j) = k + 2;
- }
-
- void
- a()
- {
- # define limit 20
- loctyp i, j;
- loctyp *F57;
-
- F57 = G56_j;
- G56_j = &j;
- b(i);
- G56_j = F57;
- }
-
- we see that references to _j inside procedure _b are replaced by
- the pointer _G_5_6__j which points to the local variable _j in pro-
- cedure _a. In order to preserve the proper semantix in the face
- of recursion the value of the pointer need also be saved in the
- local variable _F_5_7 during the invocation of _a.
-
- - Procedure parameters offer little problems. We translate
- Pascal value-parameters into ordinary C parameters and Pas-
- cal var-parameters are treated as pointers.
-
-
-
-
-
- - 9 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- - Procedural parameters appear at first to be easy to handle.
- The ordinary program:
-
- program p;
-
- procedure pp(procedure q(i : integer));
-
- begin
- q(2)
- end;
-
- procedure qq(i : integer);
- begin
- end;
-
- begin
- pp(qq)
- end.
-
- becomes
-
- extern void exit();
- typedef int integer;
-
- void
- pp(q)
- void (*q)();
- {
- (*q)(2);
- }
-
- void
- qq(i)
- integer i;
- {
- }
-
- main()
- {
- pp(qq);
- exit(0);
- }
-
- which looks simple enough.
- However, Pascal requires that the scope of a procedure
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 10 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _r_e_m_a_i_n_s _u_n_c_h_a_n_g_e_d throughout its lifetime. Consider:
-
- program demo(output);
-
- var i : integer;
-
- procedure p(procedure q);
-
- var j : integer;
-
- procedure qq;
-
- begin
- writeln(j)
- end;
-
- begin
- j := i;
- q;
- if i < 1 then
- begin
- i := i + 1;
- p(qq)
- end
- end;
-
- procedure dummy;
- begin
- end;
-
- begin
- i := 0;
- p(dummy)
- end.
-
- When _p is first invoked it assigns the local variable _j the
- value 0. This variable is accessible from _q_q which is
- passed as a parameter in the recursive call of _p. The
- second invocation of _p then sets _i_t_s variable _j to 1, and
- calls _q which is bound to the first instance of _q_q, and
- should therefore print the number _0. _S_a_d_l_y, _t_h_e _c_o_d_e _p_r_o_-
- _d_u_c_e_d _b_y _t_h_e _t_r_a_n_s_l_a_t_o_r _f_a_i_l_s _t_o _d_o _t_h_i_s. It is obvious
- that the program above calls for a complete simulation of
- the activation record mechanism of Pascal to work correctly.
-
- A workable but unpractical solution would be:
-
- 1) When qq is used as parameter we call a function q1
- which saves the environment for qq (i.e. the address of
- j) in a well known place and returns a pointer to q2.
-
- 2) When qq is later called (under the name q) the actual
- target will be q2 which sets up the proper environment
- calls qq.
-
- The problem is that this requires a save-area for _e_a_c_h pro-
- cedural parameter which can hold the intresting parts of its
- environment for _e_a_c_h of its invocations. In the example
- above we need one are which holds the address of i for each
-
-
-
-
- - 11 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- instance of qq (say N instances, where N depends on the
- run-time behaviour of p). It also requires a set of dif-
- ferent procedures to perform the work of q2 (N different
- procedures which sets up the environment for the N
- instances). _T_h_i_s _r_e_q_u_i_r_e_s _m_u_c_h _t_o _m_u_c_h _w_o_r_k _t_o _i_m_p_l_e_m_e_n_t _s_o
- _t_h_e _p_r_o_b_l_e_m _i_s _l_e_f_t _u_n_s_o_l_v_e_d, this is hardly a problem in
- practice since humans rarely write such code but _i_t _c_o_u_l_d
- _i_n_t_r_o_d_u_c_e _p_r_o_b_l_e_m_s in machine generated Pascal code.
-
- It should be noted that the translator accepts the keyword
- _e_x_t_e_r_n_a_l in place of the Pascal _f_o_r_w_a_r_d directive and
- assumes that the so declared procedure is defined elsewhere.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 12 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _3._4. _S_t_a_t_e_m_e_n_t_s.
-
- Pascal statements are comparatively easy to translate to C. The
- only parts that require special care are non-local _g_o_t_o, _w_i_t_h and
- _f_o_r statements. The code
-
- program p(output);
-
- type
- msgtyp = packed array [ 1 .. 12 ] of char;
-
- var
- a : array [ 1 .. 10 ] of
- record
- r : real
- end;
- i : integer;
- ok : boolean;
-
- procedure fatal(m : msgtyp);
-
- begin
- writeln('Fatal error: ', m)
- end;
-
- begin
- while true do
- repeat
- ok := false;
- i := 1;
- for i := i + 1 to i + 10 do
- if i > 10 then
- fatal(' 10 exceeded')
- else
- with a[i] do
- if r > 9.99e+38 then
- ok := true
- else
- writeln(r)
- until ok
- end.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 13 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- becomes
-
- typedef char boolean;
- # define false (boolean)0
- # define true (boolean)1
- typedef int integer;
- typedef double real;
-
- typedef struct { char A[12 - 1 + 1]; } msgtyp;
- typedef struct { struct S57 {
- real r;
- } A[10 - 1 + 1]; } T56;
- T56 a;
- integer i;
- boolean done;
-
- void
- fatal(m)
- msgtyp m;
- {
- (void)fprintf(output.fp, "Fatal error: %.12s", m.A),
- Putchr('\n', output);
- }
-
- main()
- {
- while (true)
- do {
- done = false;
- i = 1;
- {
- integer B1 = i + 1, B2 = i + 10;
-
- if (B1 <= B2)
- for (i = B1; ; i++) {
- if (i > 10)
- fatal(*((msgtyp *)" 10 exceeded"));
- else {
- register struct S57 *W3 = &a.A[i - 1];
-
- if (W3->r > 9.99e+38)
- done = true;
- else
- (void)fprintf(output.fp, "% 20e", W3->r),
- Putchr('\n', output);
- }
- if (i == B2) break;
- }
- }
- } while (!(done));
- exit(0);
- }
-
- for the following reasons:
-
- _3._4._1. _B_e_g_i_n
-
- The compound statements are very similar in the two languages and
- need no further explanation.
-
-
-
-
- - 14 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _3._4._2. _I_f
-
- Pascal if-statements have the same structure and semantix as C
- if-statments.
-
- _3._4._3. _C_a_s_e
-
- Pascal case-statements have the same structure and semantix as C
- switch-statements provided that a _b_r_e_a_k is always added to each
- entry.
-
- The translator supports a common Pascal extension in that it
- recognizes the keyword _o_t_h_e_r_w_i_s_e to signify a default entry in a
- case-statement.
-
- _3._4._4. _L_a_b_e_l_s
-
- Pascal labeled statements and labels have the same structure and
- semantix as C labeled statements except that Pascal labels are
- numbers where C labels are identifiers, this difference is solved
- by simply prefixing the labels with the letter _L.
-
- _3._4._5. _G_o_t_o
-
- Pascal goto-statements have the same structure as C goto state-
- ments but the semantix differ in that Pascal allows a goto to
- lead out of the current procedure. This is implemented using the
- _s_e_t_j_m_p/_l_o_n_g_j_m_p library functions of C as described earlier.
-
- _3._4._6. _W_i_t_h
-
- The with-statement of Pascal has no counterpart in C. It is
- translated into nested compund statements where pointervariables,
- referencing the corresponding records, are declared and initial-
- ized. Within the scope of the with-statement, the accessible
- record fields are renamed to include the pointer. The effect of
- this is that the record address is evaluated once as the Pascal
- standard requires.
-
- _3._4._7. _F_o_r
-
- The for-statement of Pascal has a structure that is similar to
- the C for-statement but the semantix differ completely. Pascal
- requires that a loop be exited when the upper bound has been
- reached, Pascal also requires that the loop-boundaries be
- evaluated exactly once. The standard C for-loop is exited when
- the loop-condition becomes false. This implies that it is not
- always true that
-
- for (i = 0; i <= e; i++) ;
-
- behaves in the same manner as
-
- for i := 0 to e do ;
-
- since (in most implementations) the C version becomes an infinite
- loop if _e equals _m_a_x_i_n_t or if _e is the expression _i. For that
- reason Pascal for-statments are translated into compound state-
- ments where the upper/lower bounds of the for-loop are held in
-
-
-
-
- - 15 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- local variables. It is also necessary to add a conditional
- break-statement at the end of the loop. It is possible to obtain
- the more relaxed translation by configuring the translator
- appropriately (see "Tuning" below).
-
- _3._4._8. _W_h_i_l_e
-
- The while-statement behaves exactly the same in both languages.
-
- _3._4._9. _R_e_p_e_a_t
-
- The repeat-statement of Pascal matches the do-while statement of
- C except for the trivial difference that Pascal permits a
- statement-list where C permits a single statment in the loop.
-
- _3._4._1_0. _E_m_p_t_y
-
- The empty statement has (of course) the same structure and seman-
- tix in both languages.
-
- _3._5. _E_x_p_r_e_s_s_i_o_n_s _a_n_d _a_s_s_i_g_n_m_e_n_t_s
-
- In most cases Pascal expressions can be literally translated into
- equivalent C expressions.
-
- identifiers Except where identifiers clash with reserved words
- or with other identifiers declared in the same
- scope, they may simply be printed. In some cases
- the translator is forced to rename identifiers or
- to invent new identifiers.
-
- operators The operators +, -, * _d_i_v and _m_o_d when applied to
- real or integer operands have exact counterparts
- in C and are therefore easy to handle. The opera-
- tor for real division, /, corresponds to the same
- C operator except that the operands may be
- integers. In that case a cast is necessary. When
- the operands are sets the expression is converted
- into a function call.
-
- The operators <, <=, >, >=, = and <> all have
- exact counterparts in C for integer and real
- operands. Most C implementations disallows _e_n_u_m
- operands, the translator therefore casts such
- operands to _i_n_t. Comparisons on structures (i.e.
- strings or sets) are converted into function
- calls.
-
- assignments Assignments are straightforward to handle since
- arrays are encapsulated in structures. Therefore:
-
- a := b
-
- becomes
-
- a = b
-
- _u_n_l_e_s_s b is a string or a set, in which case the
- assignment is converted into a function call.
-
-
-
-
- - 16 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- indexing Given the translation for array declarations
- (described above) we are forced to translate
-
- a[i]
-
- into
-
- a.A[i - c]
-
- where _c is the lower bound for the index type.
-
- selection Given the translation for records with variants
- (described above) the translation of
-
- a.b
-
- becomes
-
- a.b
-
- _o_r, if b is declared in a variant,
-
- a.Vxxx.b
-
- where Vxxx is a name manufactured by the transla-
- tor for the particular variant.
-
- dereferencing Pointer references and _v_a_r-parameters are handled
- by prefixing the expression with an asterisk, but
- the special case dereferencing followed by selec-
- tion is also recognized, so:
-
- p^ := q^.f
-
- becomes
-
- *p = q->f
-
-
- miscellanea The boolean operators _a_n_d, _o_r and _n_o_t are simply
- translated into their C counterparts. The set
- contructors [], and .. and the operator _i_n are
- converted to function calls.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 17 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _4. _I_m_p_l_e_m_e_n_t_a_t_i_o_n
-
- The general strategy is to convert the Pascal source program into
- a parsetree. The tree is then traversed by a set of procedures
- that perform some necessary transformations. The tree is finally
- traversed by a set of procedures that print the corresponding C
- constructs. The translator consists of three major procedures
- that perform these actions. They are augmented by a set of pro-
- cedures that maintain a symbol table that holds information about
- identifiers found in the source, and by a procedure that initial-
- izes all internal datastructures.
-
- There are three major datastructures that interact in complicated
- ways:
-
- 1) a store for identifiers and strings
-
- 2) a multilevel symbol table
-
- 3) a parse-tree.
-
- The identifier and string store, _s_t_r_s_t_o_r is in principle an array
- of characters that grow in increments of one string block.
- Exactly one copy of each identifier is stored in that array.
- Whenever an identifier is found in the input stream the array is
- scanned to see if that identifier has been seen before. In order
- to speed up the search all identifiers are represented by nodes
- which are chained together such that all nodes in a particular
- chain have the same hashvalue as determined by the function _h_a_s_h.
- Each _i_d_n_o_d_e holds an index to strstor where the corresponding
- identifier text is stored. The start of the hashchains are found
- in the global variable _i_d_t_a_b.
-
- idtab
- +----+
- | | chain of idnodes with same hashvalue
- +----+ +--------+ +--------+
- | |----->| |----->| |idnode representing
- +----+ | | index=2| |identifier "demo"
- | | +--------+ +--------+
-
- strstor
- +------------ - --------+
- | | | | |
- +------------ - --------+|
- |
-
- +--------------------------------
- | | |d |e |m |o |/ |
- +-------------------------------- first idblock
- |^
- index=2
-
-
- So: the global representation of the identifier "demo" is a par-
- ticlular idnode; whenever the lexical analysis routine recognizes
- the identifier "demo" it returns a pointer to that idnode.
-
-
-
-
- - 18 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- In order to represent different identifiers with the same name we
- need to be able to distinguish between identifiers declared in
- different scopes. This is accomplished by the _s_y_m_n_o_d_e struc-
- tures. When an identifier is first encountered in a given scope
- it is "declared", meaning that a new symnode is created that
- references the identifier. Occurrences of the same identifier in
- that scope are then represented in the parse-tree by treenodes
- referencing the same symnode.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 19 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- The program:
-
- program p;
-
- var demo : integer;
-
- begin
- demo := 3
- end.
-
- yilds the following structure:
-
- top
-
- +----------------+treenode representing
- npgm| |the program
- +----|-----|-----|-----+
- | | |^| |^
- | | || +-----------------------------+
- | | |+----------------------------+ |
- | | | |
- | +-----|-------+treenode representing| |
- | nvar| |the var-declaration|
- | +---|----|-----+ +---------------|---+treenode repr.
- | | |^| nassign| |assignment
- | | |+-------> to type+----|---------|-----+
- symtab| | |^ |^
- +----+ | +-----|---+treenode repr.+-------|---+ +-----|-------+
- | |<------+ nid| |occurrence ofnid| | ninteger| |
- +--|-----+id. "demo" +----|-----+ +------|-----+
- | | | |^ | | |^
- +----+ | |+--------------------+ | |
- | | | |
- +----+ +-------|-------+symnode representing+-------|------+
- | |-----> lidentifier| |identifier "demo" linteger| |
- +----+ +----|---------+in the current scopelinum=3| |
- | +-----------+
- idtab +------------+
- +----+ |
- | |
- +----+ +--------+ +--------+
- | |----->| |----->| |idnode representing
- +----+ | | index=2| |identifier "demo"
- | | +--------+ +--------+
-
- strstor
- +------------ - --------+
- | | | | |
- +---|----------- - --------+
- |
-
- +--------------------------------
- | | |d |e |m |o |/ |
- +-------------------------------- first idblock
- |^
- index=2
-
- We see that the two occurrences of the identifier "demo" are
- represented by two _t_r_e_e_n_o_d_e_s of variant _n_i_d that both have
-
-
-
-
- - 20 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- pointers to the same _s_y_m_n_o_d_e representing the declaration of
- "demo". All symnodes at a given declarationlevel are chained
- together (in the same manner as the idnodes) and the chains are
- attached to the global variable _s_y_m_t_a_b. In order to find out if
- an identifer is declared in the current scope we scan the chain
- of symnodes starting from symtab, and check if any of them refer-
- ences the idnode we are looking for. A symnode also have a
- pointer to the treenode which defines the symbol. The _s_y_m_t_a_b_s
- themselves are also chained, the chain implements a stack of
- declarationlevels. The symtab which is created when the scope of
- a procedure is entered is also attached to that procedure. When
- a procedure is entered we create a new symtab, push it onto the
- stack, parse the procedure and pop the stack. The symbols then
- visible are those that still can be reached from the stack.
-
- The parse-tree consists of _t_r_e_e_n_o_d_e_s representing each declara-
- tion, statement, expression etc. Each node except the nprogram
- node has a pointer to its immediate father (giving the defining
- point), to its children (if it has any) and to its right brother
- (if it is a member of a list). The top of the tree is found in
- the global variable _t_o_p.
-
- In order to find the defining point for the identifier in the
- assignment, we follow pointers from the nassign-treenode to the
- nid-treenode, to the lidentifier-symnode, and then up to the
- nid-treenode in the declaration. If we want to know the type of
- the identifier we proceed up to the nvar-treenode and then down
- to the node giving the type in the declaration (in our example
- that node would also be a nid-treenode referencing a linteger-
- symnode that represents the identifier "integer"). There is a
- function _t_y_p_e_o_f that performs exactly this operation. It will
- return a pointer to a npredef-, nsubrange-, nscalar-, nptr-
- narray-, nrecord-, nfileof- or nsetof-treenode. In those cases
- where the translator pays attention to types it uses a pointer
- (obtained from typeof) as representation of a type.
-
- Given the parse-tree and the layered symbol table it is not hard
- to see how the translations described earlier are performed. The
- one place where the code becomes more than usually complex is
- when a _w_r_i_t_e statement with format specifications is to be
- translated into a call to _f_p_r_i_n_t_f.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 21 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- _5. _T_u_n_i_n_g
-
- The behaviour of the translator may be modified for some cases
- simply by changing constants. These constants are all located at
- the top of the program text.
-
- output The translator will copy the source during input if
- the constant _e_c_h_o is true. The copy is preceeded by
- the line
-
- # ifdef PASCAL
-
- and ended by the line
-
- # else
-
- and then follows the translated code and finally
-
- # endif
-
- This may be used to hold both Pascal and C source in
- the same file. If the Pascal part is changed the C
- part may be updated through:
-
- cpp -P -DPASCAL < orig > tmp
- ptc < tmp > new
- move new orig
-
- assuming that _e_c_h_o is true and that _c_p_p is the stan-
- dard C preprocessor.
-
- Default value:
-
- echo = false;
-
-
- comments The translator recognizes both (* and { as comment
- delimiters. They are treated as different, allowing
- 1 level nested comments, if the constant _d_i_f_f_c_o_m is
- true.
-
- Default value:
-
- diffcomm = false;
-
-
- symbols The translator accepts default entries in case-
- statements provided that the keyword defined through
- _o_t_h_e_r_s_y_m is used in place of the constant list.
-
- Default value:
-
- othersym = 'otherwise ';
-
- substitute for
-
- othersym = 'otherwise%';
-
- if that feature is undesired.
-
-
-
-
- - 22 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- The translator accepts externally declared procedures
- and functions provided that the directive defined
- through _e_x_t_e_r_n_s_y_m is used in place of the keyword
- _f_o_r_w_a_r_d.
-
- Default value:
-
- externsym = 'external ';
-
-
- sets Sets are implemented as arrays of _w_o_r_d_t_y_p_e. The type
- is assumed to hold _s_e_t_b_i_t_s + _1 bits numbered from 0.
-
- Default value:
-
- wordtype = 'unsigned short';
- setbits = 15;
-
-
- files The implementation of files uses a flag-word which
- has the type given as _f_i_l_e_b_i_t_s which is assumed to
- hold _f_i_l_e_f_i_l_l + _4 bits.
-
- Default value:
-
- filebits = 'unsigned short';
- filefill = 12;
-
-
- stmts If the Pascal source is known to be "normal" in the
- sense that for-loops always have an upper bound that
- is less than the maximum value of the type of the
- loop-variable, and in the sense that the upper bound
- doesn't change by computations inside the loop, then
- the translator may generate a more natural code.
- I.e:
-
- for i := 1 to 10 do ;
-
- becomes
-
- for (i = 1; i <= 10; i++) ;
-
- Since the requirements cannot be determined by the
- translator itself this kind of code is generated when
- the constant _l_a_z_y_f_o_r is true.
-
- Default value:
-
- lazyfor = false;
-
-
-
-
-
-
-
-
-
-
-
- - 23 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- new The second and following parameters to the procedure
- _n_e_w will be ignored if the constant _u_n_i_o_n_n_e_w is
- false.
-
- Default value:
-
- unionnew = true;
-
-
- strings All identifiers and strings are stored in the trans-
- lator with the special character having the ordinal
- value _n_u_l_l as endmarker. Hence, that character can
- not occur in strings in the Pascal source.
-
- Default value:
-
- null = 0;
-
-
- types The names of predefined types are given by the con-
- stants: _i_n_t_t_y_p, _c_h_a_r_t_y_p, _f_l_o_a_t_t_y_p, and _d_o_u_b_l_e_t_y_p.
-
- Default value:
-
- inttyp = 'int';
- chartyp = 'char';
- floattyp = 'float';
- doubletyp = 'double';
-
- The typename for real variables and functions defined
- by the user is given by the constant _r_e_a_l_t_y_p.
-
- Default value:
-
- realtyp = doubletyp;
-
- The typename for procedures is given by the constant
- _v_o_i_d_t_y_p.
-
- Default value:
-
- voidtyp = 'void';
-
-
- i/o The default fieldwidths for integer and real values
- written on textfiles are given by the constants
- _i_n_t_l_e_n and _f_i_x_l_e_n.
-
- Default value:
-
- intlen = 10;
- fixlen = 20;
-
-
- types A table in the translator gives the mapping from Pas-
- cal integer subrange types to C arithmetic types.
- The table is initialized by code located at the end
- of procedure _i_n_i_t_i_a_l_i_z_e giving the following default
- configuration:
-
-
-
-
- - 24 -
-
-
-
-
- Holistic Technology AB PTC HD870410-1 Rev: 1.6
-
- Low bound High bound Selected type
-
- 0 255 unsigned char
- -128 127 char
- 0 65535 unsigned short
- -32768 32767 short
- -2147483647 2147483647 long
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 25 -
-
-
-