home *** CD-ROM | disk | FTP | other *** search
- DESIGN.DOC: Design issues pertaining to the CSTAR translator
-
- June 25, 1991
-
- The following notes were written in 1987, I believe. The subject of these
- notes are various issues concerning the design of the CSTAR translator, and
- in some cases the CSTAR language. Be warned: these notes are extremely
- technical in nature and assume a deep level of knowledge of the CSTAR
- translator. They are likely to be of interest only to *serious* students of
- CSTAR.
-
- Parts of this file have the flavor of cryptic "notes to myself." If you have a
- question, please don't be intimidated by the obscure nature of these notes.
- Just give me a call and I'll be happy to tell you what I know. By the way,
- much of this file was written by a brilliant programmer named Paul Schick,
- who worked on CSTAR for about a year with me. Parts of this file are in
- dialog form, with my comments being marked by EKR:
-
- Another way of looking at this file is as pages from an engineering
- notebook. Some of the issues discusses here never really got satisfactorily
- resolved. The section called goals, principles and problems was written by
- me, and the rest of the file is really a discussion of the problems in meeting
- those goals and principles in any kind of clean way.
-
- Some terminology: LHS stands for the Left Hand Side of an assignment
- statement and RHS stands for the Right Hand Side. A "star expression" is
- an expression containing the * operator. A "loc node" is the part of the
- parse tree containing all information for a "location," which usually
- corresponds to some identifier. Understanding how loc nodes are created
- and used is a key to understanding how the compiler works.
-
-
- GOALS, PRINCIPLES AND PROBLEMS
-
- The primary goal is to make available all of the efficiency of assembly
- language in a reasonable manner.
-
- 1. Register allocation must be done in an obvious and efficient way.
-
- 2. As much as possible, consistent with other principles, we want to allow
- the programmer to experiment with register allocation without rewriting the
- form of expressions.
-
- 3. The details of the 68000 architecture should be separate from the actual
- code. I see no necessity of embedding all the details of which addressing
- modes are allowed in which instruction all through the compiler code. This
- suggests avoiding flaky rules on expression composition.
-
- 4. The code generator ought to function independently of the the precise
- details of what kinds of expressions are allowed, but that may be wishful
- thinking. The code generation process can be broken down in concept into
- three phases:
-
- a) Assign result location to each operator. Once this is done, all other
- decisions are completely determined. For binary operators, one operand
- will, in general, be a result location from another operator.
-
- b) Determine whether a machine instruction exists to apply the indicated
- operator to the two operands and produce the required result. Again, the
- previous phase will have tried to ensure that the result location is the same
- as one of the arguments.
-
- c) A peephole optimizer pass operates to handle flow-of-control problems.
- At present, I do not see a peephole pass being applied to arithmetic
- expressions, though I have not really looked into this possibility.
-
- Consider the following expression.
-
- a = b + (c = d + (e = 2 + f)));
-
- The CSTAR translator treats this as exactly equivalent to
-
- e = 2 + f;
- c = d + e;
- a = b + c;
-
- with the restriction that c may not be a and e may not be c. But C compilers
- are free to rearrange the order of evaluation of operands, so additional
- restrictions are needed to ensure that the corresponding C expression always
- is evaluated correctly. In the above example, d should not be e and b should
- not be c. Isn't the assignment a sequence point?
-
-
- ON PREDICTABILITY
-
- We are entitled to predict what we have specified, and no more (or not very
- much more.) It seems to me, right now, that a statement like:
-
- m1 = (m2*m3) + (m4*m5);
-
- really is, even in CSTAR a specification which says:
-
- 1. put the LHS into the RHS and don't tell me how you did it.
-
- 2. don't touch anything except the authorized temporaries.
-
- I'm really not entitled to know more, because I didn't say more.
-
- Now, in practice, I ought to help myself to a little bit more, and say that as
- long as every non-assignment operator corresponds to an assignment
- operator, one for one, e. g.:
-
- m1 = (d2 = m2 * m3) + (d3 = m4 + m5);
-
- then there really aren't any choices left (except for whatever moves are
- necessary to reconcile operands and results), so now--and only now--I
- haven't asked for any surprises, and so I'm entitled to not get them.
-
- EKR: doesn't this still leave open the question of whether to use the LHS
- as a temporary or not?
-
- What's new and useful is that I have that choice. I didn't have that choice in
- full C, because I can only specify these things loosely in full C, and the
- compiler is entitled to ignore me. I didn't have that choice in macro
- assembler, but for a different reason: I was obliged to specify everything all
- the time, whether I needed to or not, which is why programming was so
- impossibly tedious.
-
- So I don't want to give up the right to specify register designators, and to
- know how function calls are done, and go back to C, because I want the
- option to handle those things in a specific, assembly-like way. Nor do I to
- eschew all compiler decisions all the time and go back to macro assembler,
- because I want relief from the burden of specifying everything all the time,
- needed or not.
-
- In a way, I want in-line assembly code, but in-line assembly code that
- works and is workable. Normally it is not workable, because the register
- designators can't be treated as variables, so you have to go through a big
- song and dance with lots of overhead in order to feed something to the in-
- line code, and then another big song and dance to get it out again; nobody
- could ever optimize anything that way.
-
- It might be helpful to have the compiler issue a message if a "ternary"
- assignment, etc., compiles to a surprise. The programmer would be
- informed about an incorrect assumption (possibly resulting from an
- overlooked instruction-set quirk), and we'd have to decide what a
- "surprise" is, or whether there's any such thing.
-
-
- HOW CAN WE EXPERIMENT WITH REGISTER/MEMORY ALLOCATION?
-
- We can't, unless the rules for composing expressions are blind to register
- vs. memory allocations. They need not be blind with respect to pointers vs.
- nonpointers, since that is a question of type.
-
- For assignments, there is always at least one data temporary register and
- one address temporary register, as specified in the command line or in the
- function. We will refer to the first pair as as and ds for convenience. The
- programmer should act as though all of these scratch registers are clobbered
- by any assignment. Any assumption to the contrary may not be portable,
- which may be a good reason to allow for specifying a temporary-count
- individually for each function. (E.g. it might be that it is infeasible to put a
- result into local memory without loading a component of the address into
- the address temporary.)
-
- On the 68000, and other processors that distinguish address and data
- registers, the type of a hardware address register shall always be pointer.
- On the 8086/80186/80286, this is guaranteed to get weird.
-
- The type of the result of an operator can be calculated in a machine
- independent way as the tree for the expression is built bottom up by the
- reduce() routine. The rules for doing this depend only on the rules stated in
- K & R.
-
- Functions which produce a non-pointer result leave that result in ds.
- Functions which produce a pointer result leave that result in as.
-
- Temporaries will be used in a canonical way in coding any expression
- which invokes them. The compiler will do no rearrangements, and will
- evaluate operands by order of association unless there are contradicting
- parentheses. In order to enhance speed and simplify code generation, a data
- register will be used in lieu of the LHS to accumulate the result, if the LHS
- is in memory and the RHS is nontrivial. Complex expressions which are
- speed- or space-critical should (as a matter of style) either be decomposed,
- or have intermediate assignments explicitly specified on the RHS.
-
- In CSTAR, stack temporaries are not used, so there is no indefinitely
- extensible set of temporaries available. It is therefore advisable to keep
- expressions as simple as possible consistent with readability, in order to
- know that there are enough temporaries.
-
- In the simple, ternary assignment statement, the LHS will not be used as an
- accumulator [if there is any alternative], [or maybe never at all, but what
- about exclusive-or?] because of the following:
-
- m = d1 + d2;
-
- option1: treat as:
-
- m = d1;
- m += d2; one move, one add, two addresses, 32 clocks
-
- option 2: treat as:
-
- m = dt = d1 + d2;
- dt = d1;
- dt += d2;
- m = dt; two moves, one add, one address, 20 clocks
-
- Most assignments will have implied moves under the covers since the
- processors generally require results to be placed in at least one, and possibly
- in both, operands, while the operator form actually written in source code
- implies no such thing. If speed or compactness is highly critical, the
- programmer must use forms that are more specific, and correspond more
- closely to the hardware. However, such forms tend to be less readable, so
- there is no requirement to use them in non-critical code.
-
- EKR: One solution [?] Have assignment operators (+=, *=, etc.) always
- produce exactly the indicated code.
-
-
- STAR EXPRESSIONS
-
- Star expressions arise out of the P> and [ ] operators, but with luck that
- should not affect this discussion.
-
- Star expressions have 3 parts, 1) an address register, 2) a data register and
- 3) a constant. At least one of these parts must be present. The parser will
- probably rearrange the parse tree so that the unary * operator has these 3
- subtrees under it.
-
- Before the parser rearranges the tree, the operand of a unary * operator is
- either:
-
- 1) another unary * operator or
-
- 2) a tree containing a star expression whose operands are joined by +
- operators (or a P operator if the P operator applies to the constant.)
-
- The a0 register is used to do multiple indirections as required. For example,
- the star expression ***a5 generates
-
- a0 = *a5;
- a0 = *a5;
-
- and *a0 is then used as the equivalent operand.
-
- Scaling is done in d0. Scaling is done if the data register is present and the
- scale factor is not 1. For example, the star expression
-
- *(a5 + d5)
-
- which is equivalent to
-
- a5 [d5]
-
- is done as
-
- d0 = d5; /* assembly language (no scaling) */
- d0 *= scale_factor;
- a0 = a5 + d0
-
- and then *a0 is used as the equivalent operand. Note that scaling by
- constants between 2 and 10 (?) is done by masking and shifting instead of
- multiplying.
-
- Similarly, the star expression
-
- * (d5 + base_constant)
-
- which is equivalent to
-
- base_constant [d5]
-
- is done as
-
- d0 = d5
- d0 *= scale_factor;
- a0 = base_constant + d0;
-
- and then *a0 is used as the equivalent operand.
-
- When scaling is not required at run time, i.e., when the data register is not
- present or when the scale factor is one, star expressions are equivalent to
- 68000 addressing modes. For example,
-
- char * a1;
- *(a1 + d1 + 5);
-
- the star expression is the same as the equivalent operand and generates the
-
- #5(a1,d1)
-
- addressing mode.
-
- One could, in theory, allow more complex star expressions by substituting
- an inner expression for the data register. For example:
-
- * (a1 + (d1 = *a2) + 5);
-
- This would be unwrapped as:
-
- d1 = *a2;
- *(a1 + d1 + 5);
-
- At present, I see no particular problem with allowing this generality.
- Indeed, one could use d0 as an implicit d register and write
-
- *(a1 + *a2 + 5);
-
- which would be the same as
-
- d0 = *a2;
- *(a1 + d0 + 5);
-
- It turns out that finding the three separate parts of a star expression is not
- difficult. At the top level of a tree for the star expression, look for a subtree
- with pointer type. That subtree contains the address register. If the subtree
- contains a binary operator, one subtree of the binary operator will contain
- the pointer and the other will contain the data register and/or the constant.
- Again, if that subtree contains a binary operator, one subtree must contain
- the data register and the other subtree must contain the constant or constant
- expression.
-
-
- WHEN ARE EXPRESSIONS TOO COMPLEX?
-
- When we run out of temporaries given the expression and the current
- settings.
-
- Associated with each assignment is a list of forbidden registers which may
- not appear in the RHS of the assignment. This list is created by the
- unwrapping program. Actually, it's probably more complicated than this.
- There must be some canonical tools.
-
- The result of a subtree need not be either the LHS or the result temporary if
- the subtree consists of a primitive. In that case, that subtree does not limit
- in any way the form of the complexity of the other subtree. Eh?
-
-
- REWRITING RULES
-
- An expression of the form
-
- a = arg1 op1 arg2 op2 ... opn argn+1
-
- is, by definition, equivalent to
-
- a = arg1
- a op1 = arg2
- a opn = argn+1
-
- I don't know anything about this from here down.
-
- As the result of this equivalence, the LHS becomes a hidden common
- subexpression. The LHS temporary may be used to store this hidden
- common subexpression. For example, if the LHS temporary is available,
- the expression
-
- ***a5 = a + b;
-
- generates
-
- a1 = **a5;
- *a1 = a;
- *a1 += b;
-
- If the LHS temporary is not used, the following code would have to be
- generated (In general, *a0 might be the effective operand for a or b).
-
- a1 = *a5;
- a1 = *a5;
- *a1 = a;
- a1 = *a5;
- a1 = *a5;
- *a1 += b;
-
- In actuality, the expression ***a5 = a + b would not be allowed if the LHS
- temporary is not used.
-
- Another way around this problem would be to write the expression this way
-
- ***a5 = d0 = a + b;
-
- In general, if the effective LHS of an expression is *an, the only
- non-primitive operators allowed on the RHS are + P and ^.
-
- Rewriting rules move a lot of "code generation" into the parser.
-
-
- MANAGEMENT OF LOC_NODES DESIGNATING SIMPLE REGISTERS
-
- The management of loc_nodes is not as simple as we once thought it was.
- At this time, in fact, they are not managed at all, but created at will
- in great profusion. The essence of the problem is that because of the
- way the parser makes loc_nodes, it is permissible to alter them at will--
- until they get attached to the code tree. Once they get attached to the
- code tree, they can no longer be altered because to do so would cause
- retroactive changes earlier in the code. This fact underlaid our bafflement
- at the initial behavior of the code generator.
-
- Two techniques are proposed to minimize proliferation, and are already
- provided for (and a compile-time switch should switch them out, to help
- us track down the inevitable bugs!) The simpler one is to mark nodes when
- they get attached to the code list, so that conditional routines like
- locn_xdupl need not copy them unless they are marked. In most cases,
- this means that extra copies won't be made except when inner-level
- assignments are used as values, as in a = b = c;
-
- The more complex one involves the locn_chmod routine, which is meant to
- manage a set of simple register and register-indirect nodes. These
- permanent nodes will be marked, so that they never get altered, and
- routines like locn_chmod will operate by substituting one for another.
- This will limit most copying of temporaries to cases where nodes are
- actually combined, rather than just referred to.
-
- Provision is also made for locn_xconst to do something similar if it
- turns out that we are generating lots of internal ones. Note that we
- may want to improve the folding routines in the parser eventually to
- eliminate the creation of superfluous zeros, in particular. These
- nodes do need to carry their types, at least transitorily, but perhaps
- there is some practical way to work this out.
-
-
- VARIABLES AND VARIABLE NAMES
-
- Variables are set up in the parser to be suitable for their intended
- storage class. This fact implies some coupling of the parser to the
- code generator; the parser is not totally machine-independent.
-
- There is as yet no mechanism for properly determining the range of
- a symbolic name; that is, whether it can be combined into an address
- mode or not, although portions are in place. In particular, globals
- have to be always regarded as out-of-range, since the linker may place them
- anywhere at its own option.
-