home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-06-03 | 53.3 KB | 1,976 lines |
- OPAA
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- LET'S BUILD A COMPILER!
-
- By
-
- Jack W. Crenshaw, Ph.D.
-
- 3 June 1989
-
-
- Part XI: LEXICAL SCAN REVISITED
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- PAA
-
-
-
-
-
- *****************************************************************
- * *
- * COPYRIGHT NOTICE *
- * *
- * Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
- * *
- *****************************************************************
-
-
- INTRODUCTION
-
- I've got some good news and some bad news. The bad news is that
- this installment is not the one I promised last time. What's
- more, the one after this one won't be, either.
-
- The good news is the reason for this installment: I've found a
- way to simplify and improve the lexical scanning part of the
- compiler. Let me explain.
-
-
- BACKGROUND
-
- If you'll remember, we talked at length about the subject of
- lexical scanners in Part VII, and I left you with a design for a
- distributed scanner that I felt was about as simple as I could
- make it ... more than most that I've seen elsewhere. We used
- that idea in Part X. The compiler structure that resulted was
- simple, and it got the job done.
-
- Recently, though, I've begun to have problems, and they're the
- kind that send a message that you might be doing something wrong.
-
- The whole thing came to a head when I tried to address the issue
- of semicolons. Several people have asked me about them, and
- whether or not KISS will have them separating the statements. My
- intention has been NOT to use semicolons, simply because I don't
- like them and, as you can see, they have not proved necessary.
-
- But I know that many of you, like me, have gotten used to them,
- and so I set out to write a short installment to show you how
- they could easily be added, if you were so inclined.
-
- Well, it turned out that they weren't easy to add at all. In
- fact it was darned difficult.
-
- I guess I should have realized that something was wrong, because
- of the issue of newlines. In the last couple of installments
- we've addressed that issue, and I've shown you how to deal with
- newlines with a procedure called, appropriately enough, NewLine.
- In TINY Version 1.0, I sprinkled calls to this procedure in
- strategic spots in the code.
-
- It seems that every time I've addressed the issue of newlines,
- though, I've found it to be tricky, and the resulting parserA*A*
- - 2 -
- PAA
-
-
-
-
-
- turned out to be quite fragile ... one addition or deletion here
- or there and things tended to go to pot. Looking back on it, I
- realize that there was a message in this that I just wasn't
- paying attention to.
-
- When I tried to add semicolons on top of the newlines, that was
- the last straw. I ended up with much too complex a solution. I
- began to realize that something fundamental had to change.
-
- So, in a way this installment will cause us to backtrack a bit
- and revisit the issue of scanning all over again. Sorry about
- that. That's the price you pay for watching me do this in real
- time. But the new version is definitely an improvement, and will
- serve us well for what is to come.
-
- As I said, the scanner we used in Part X was about as simple as
- one can get. But anything can be improved. The new scanner is
- more like the classical scanner, and not as simple as before.
- But the overall compiler structure is even simpler than before.
- It's also more robust, and easier to add to and/or modify. I
- think that's worth the time spent in this digression. So in this
- installment, I'll be showing you the new structure. No doubt
- you'll be happy to know that, while the changes affect many
- procedures, they aren't very profound and so we lose very little
- of what's been done so far.
-
- Ironically, the new scanner is much more conventional than the
- old one, and is very much like the more generic scanner I showed
- you earlier in Part VII. Then I started trying to get clever,
- and I almost clevered myself clean out of business. You'd think
- one day I'd learn: K-I-S-S!
-
-
- THE PROBLEM
-
- The problem begins to show itself in procedure Block, which I've
- reproduced below:
-
-
- {--------------------------------------------------------------}
- { Parse and Translate a Block of Statements }
-
- procedure Block;
- begin
- Scan;
- while not(Token in ['e', 'l']) do begin
- case Token of
- 'i': DoIf;
- 'w': DoWhile;
- 'R': DoRead;
- 'W': DoWrite;
- else Assignment;
- end;
- Scan;A*A*
- - 3 -
- PAA
-
-
-
-
-
- end;
- end;
- {--------------------------------------------------------------}
-
-
- As you can see, Block is oriented to individual program
- statements. At each pass through the loop, we know that we are
- at the beginning of a statement. We exit the block when we have
- scanned an END or an ELSE.
-
- But suppose that we see a semicolon instead. The procedure as
- it's shown above can't handle that, because procedure Scan only
- expects and can only accept tokens that begin with a letter.
-
- I tinkered around for quite awhile to come up with a fix. I
- found many possible approaches, but none were very satisfying. I
- finally figured out the reason.
-
- Recall that when we started with our single-character parsers, we
- adopted a convention that the lookahead character would always be
- prefetched. That is, we would have the character that
- corresponds to our current position in the input stream fetched
- into the global character Look, so that we could examine it as
- many times as needed. The rule we adopted was that EVERY
- recognizer, if it found its target token, would advance Look to
- the next character in the input stream.
-
- That simple and fixed convention served us very well when we had
- single-character tokens, and it still does. It would make a lot
- of sense to apply the same rule to multi-character tokens.
-
- But when we got into lexical scanning, I began to violate that
- simple rule. The scanner of Part X did indeed advance to the
- next token if it found an identifier or keyword, but it DIDN'T do
- that if it found a carriage return, a whitespace character, or an
- operator.
-
- Now, that sort of mixed-mode operation gets us into deep trouble
- in procedure Block, because whether or not the input stream has
- been advanced depends upon the kind of token we encounter. If
- it's a keyword or the target of an assignment statement, the
- "cursor," as defined by the contents of Look, has been advanced
- to the next token OR to the beginning of whitespace. If, on the
- other hand, the token is a semicolon, or if we have hit a
- carriage return, the cursor has NOT advanced.
-
- Needless to say, we can add enough logic to keep us on track.
- But it's tricky, and makes the whole parser very fragile.
-
- There's a much better way, and that's just to adopt that same
- rule that's worked so well before, to apply to TOKENS as well as
- single characters. In other words, we'll prefetch tokens just as
- we've always done for characters. It seems so obvious once you
- think about it that way.A*A*
- - 4 -
- PAA
-
-
-
-
-
- Interestingly enough, if we do things this way the problem that
- we've had with newline characters goes away. We can just lump
- them in as whitespace characters, which means that the handling
- of newlines becomes very trivial, and MUCH less prone to error
- than we've had to deal with in the past.
-
-
- THE SOLUTION
-
- Let's begin to fix the problem by re-introducing the two
- procedures:
-
- {--------------------------------------------------------------}
- { Get an Identifier }
-
- procedure GetName;
- begin
- SkipWhite;
- if Not IsAlpha(Look) then Expected('Identifier');
- Token := 'x';
- Value := '';
- repeat
- Value := Value + UpCase(Look);
- GetChar;
- until not IsAlNum(Look);
- end;
-
-
- {--------------------------------------------------------------}
- { Get a Number }
-
- procedure GetNum;
- begin
- SkipWhite;
- if not IsDigit(Look) then Expected('Number');
- Token := '#';
- Value := '';
- repeat
- Value := Value + Look;
- GetChar;
- until not IsDigit(Look);
- end;
- {--------------------------------------------------------------}
-
-
- These two procedures are functionally almost identical to the
- ones I showed you in Part VII. They each fetch the current
- token, either an identifier or a number, into the global string
- Value. They also set the encoded version, Token, to the
- appropriate code. The input stream is left with Look containing
- the first character NOT part of the token.
-
- We can do the same thing for operators, even multi-character
- operators, with a procedure such as:A*A*
- - 5 -
- PAA
-
-
-
-
-
- {--------------------------------------------------------------}
- { Get an Operator }
-
- procedure GetOp;
- begin
- Token := Look;
- Value := '';
- repeat
- Value := Value + Look;
- GetChar;
- until IsAlpha(Look) or IsDigit(Look) or IsWhite(Look);
- end;
- {--------------------------------------------------------------}
-
- Note that GetOp returns, as its encoded token, the FIRST
- character of the operator. This is important, because it means
- that we can now use that single character to drive the parser,
- instead of the lookahead character.
-
- We need to tie these procedures together into a single procedure
- that can handle all three cases. The following procedure will
- read any one of the token types and always leave the input stream
- advanced beyond it:
-
-
- {--------------------------------------------------------------}
- { Get the Next Input Token }
-
- procedure Next;
- begin
- SkipWhite;
- if IsAlpha(Look) then GetName
- else if IsDigit(Look) then GetNum
- else GetOp;
- end;
- {--------------------------------------------------------------}
-
-
- ***NOTE that here I have put SkipWhite BEFORE the calls rather
- than after. This means that, in general, the variable Look will
- NOT have a meaningful value in it, and therefore we should NOT
- use it as a test value for parsing, as we have been doing so far.
- That's the big departure from our normal approach.
-
- Now, remember that before I was careful not to treat the carriage
- return (CR) and line feed (LF) characters as white space. This
- was because, with SkipWhite called as the last thing in the
- scanner, the encounter with LF would trigger a read statement.
- If we were on the last line of the program, we couldn't get out
- until we input another line with a non-white character. That's
- why I needed the second procedure, NewLine, to handle the CRLF's.
-
- But now, with the call to SkipWhite coming first, that's exactly
- the behavior we want. The compiler must know there's anotherA*A*
- - 6 -
- PAA
-
-
-
-
-
- token coming or it wouldn't be calling Next. In other words, it
- hasn't found the terminating END yet. So we're going to insist
- on more data until we find something.
-
- All this means that we can greatly simplify both the program and
- the concepts, by treating CR and LF as whitespace characters, and
- eliminating NewLine. You can do that simply by modifying the
- function IsWhite:
-
-
- {--------------------------------------------------------------}
- { Recognize White Space }
-
- function IsWhite(c: char): boolean;
- begin
- IsWhite := c in [' ', TAB, CR, LF];
- end;
- {--------------------------------------------------------------}
-
-
- We've already tried similar routines in Part VII, but you might
- as well try these new ones out. Add them to a copy of the Cradle
- and call Next with the following main program:
-
-
- {--------------------------------------------------------------}
- { Main Program }
-
- begin
- Init;
- repeat
- Next;
- WriteLn(Token, ' ', Value);
- until Token = '.';
- end.
- {--------------------------------------------------------------}
-
-
- Compile it and verify that you can separate a program into a
- series of tokens, and that you get the right encoding for each
- token.
-
- This ALMOST works, but not quite. There are two potential
- problems: First, in KISS/TINY almost all of our operators are
- single-character operators. The only exceptions are the relops
- >=, <=, and <>. It seems a shame to treat all operators as
- strings and do a string compare, when only a single character
- compare will almost always suffice. Second, and much more
- important, the thing doesn't WORK when two operators appear
- together, as in (a+b)*(c+d). Here the string following 'b' would
- be interpreted as a single operator ")*(."
-
- It's possible to fix that problem. For example, we could just
- give GetOp a list of legal characters, and we could treat theA*A*
- - 7 -
- PAA
-
-
-
-
-
- parentheses as different operator types than the others. But
- this begins to get messy.
-
- Fortunately, there's a better way that solves all the problems.
- Since almost all the operators are single characters, let's just
- treat them that way, and let GetOp get only one character at a
- time. This not only simplifies GetOp, but also speeds things up
- quite a bit. We still have the problem of the relops, but we
- were treating them as special cases anyway.
-
- So here's the final version of GetOp:
-
-
- {--------------------------------------------------------------}
- { Get an Operator }
-
- procedure GetOp;
- begin
- SkipWhite;
- Token := Look;
- Value := Look;
- GetChar;
- end;
- {--------------------------------------------------------------}
-
-
- Note that I still give the string Value a value. If you're truly
- concerned about efficiency, you could leave this out. When we're
- expecting an operator, we will only be testing Token anyhow, so
- the value of the string won't matter. But to me it seems to be
- good practice to give the thing a value just in case.
-
- Try this new version with some realistic-looking code. You
- should be able to separate any program into its individual
- tokens, with the caveat that the two-character relops will scan
- into two separate tokens. That's OK ... we'll parse them that
- way.
-
- Now, in Part VII the function of Next was combined with procedure
- Scan, which also checked every identifier against a list of
- keywords and encoded each one that was found. As I mentioned at
- the time, the last thing we would want to do is to use such a
- procedure in places where keywords should not appear, such as in
- expressions. If we did that, the keyword list would be scanned
- for every identifier appearing in the code. Not good.
-
- The right way to deal with that is to simply separate the
- functions of fetching tokens and looking for keywords. The
- version of Scan shown below does NOTHING but check for keywords.
- Notice that it operates on the current token and does NOT advance
- the input stream.
-
-
- {--------------------------------------------------------------}A*A*
- - 8 -
- PAA
-
-
-
-
-
- { Scan the Current Identifier for Keywords }
-
- procedure Scan;
- begin
- if Token = 'x' then
- Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
- end;
- {--------------------------------------------------------------}
-
-
- There is one last detail. In the compiler there are a few places
- that we must actually check the string value of the token.
- Mainly, this is done to distinguish between the different END's,
- but there are a couple of other places. (I should note in
- passing that we could always eliminate the need for matching END
- characters by encoding each one to a different character. Right
- now we are definitely taking the lazy man's route.)
-
- The following version of MatchString takes the place of the
- character-oriented Match. Note that, like Match, it DOES advance
- the input stream.
-
-
- {--------------------------------------------------------------}
- { Match a Specific Input String }
-
- procedure MatchString(x: string);
- begin
- if Value <> x then Expected('''' + x + '''');
- Next;
- end;
- {--------------------------------------------------------------}
-
-
- FIXING UP THE COMPILER
-
- Armed with these new scanner procedures, we can now begin to fix
- the compiler to use them properly. The changes are all quite
- minor, but there are quite a few places where changes are
- necessary. Rather than showing you each place, I will give you
- the general idea and then just give the finished product.
-
-
- First of all, the code for procedure Block doesn't change, though
- its function does:
-
-
- {--------------------------------------------------------------}
- { Parse and Translate a Block of Statements }
-
- procedure Block;
- begin
- Scan;
- while not(Token in ['e', 'l']) do beginA*A*
- - 9 -
- PAA
-
-
-
-
-
- case Token of
- 'i': DoIf;
- 'w': DoWhile;
- 'R': DoRead;
- 'W': DoWrite;
- else Assignment;
- end;
- Scan;
- end;
- end;
- {--------------------------------------------------------------}
-
-
- Remember that the new version of Scan doesn't advance the input
- stream, it only scans for keywords. The input stream must be
- advanced by each procedure that Block calls.
-
- In general, we have to replace every test on Look with a similar
- test on Token. For example:
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Boolean Expression }
-
- procedure BoolExpression;
- begin
- BoolTerm;
- while IsOrOp(Token) do begin
- Push;
- case Token of
- '|': BoolOr;
- '~': BoolXor;
- end;
- end;
- end;
- {--------------------------------------------------------------}
-
-
- In procedures like Add, we don't have to use Match anymore. We
- need only call Next to advance the input stream:
-
-
- {--------------------------------------------------------------}
- { Recognize and Translate an Add }
-
- procedure Add;
- begin
- Next;
- Term;
- PopAdd;
- end;
- {-------------------------------------------------------------}ABAB
- - 10 -A*A*
- PAA
-
-
-
-
-
- Control structures are actually simpler. We just call Next to
- advance over the control keywords:
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate an IF Construct }
-
- procedure Block; Forward;
-
- procedure DoIf;
- var L1, L2: string;
- begin
- Next;
- BoolExpression;
- L1 := NewLabel;
- L2 := L1;
- BranchFalse(L1);
- Block;
- if Token = 'l' then begin
- Next;
- L2 := NewLabel;
- Branch(L2);
- PostLabel(L1);
- Block;
- end;
- PostLabel(L2);
- MatchString('ENDIF');
- end;
- {--------------------------------------------------------------}
-
-
- That's about the extent of the REQUIRED changes. In the listing
- of TINY Version 1.1 below, I've also made a number of other
- "improvements" that aren't really required. Let me explain them
- briefly:
-
- (1) I've deleted the two procedures Prog and Main, and combined
- their functions into the main program. They didn't seem to
- add to program clarity ... in fact they seemed to just
- muddy things up a little.
-
- (2) I've deleted the keywords PROGRAM and BEGIN from the
- keyword list. Each one only occurs in one place, so it's
- not necessary to search for it.
-
- (3) Having been bitten by an overdose of cleverness, I've
- reminded myself that TINY is supposed to be a minimalist
- program. Therefore I've replaced the fancy handling of
- unary minus with the dumbest one I could think of. A giant
- step backwards in code quality, but a great simplification
- of the compiler. KISS is the right place to use the other
- version.ABAB
- - 11 -A*A*
- PAA
-
-
-
-
-
- (4) I've added some error-checking routines such as CheckTable
- and CheckDup, and replaced in-line code by calls to them.
- This cleans up a number of routines.
-
- (5) I've taken the error checking out of code generation
- routines like Store, and put it in the parser where it
- belongs. See Assignment, for example.
-
- (6) There was an error in InTable and Locate that caused them
- to search all locations instead of only those with valid
- data in them. They now search only valid cells. This
- allows us to eliminate the initialization of the symbol
- table, which was done in Init.
-
- (7) Procedure AddEntry now has two arguments, which helps to
- make things a bit more modular.
-
- (8) I've cleaned up the code for the relational operators by
- the addition of the new procedures CompareExpression and
- NextExpression.
-
- (9) I fixed an error in the Read routine ... the earlier value
- did not check for a valid variable name.
-
-
- CONCLUSION
-
- The resulting compiler for TINY is given below. Other than the
- removal of the keyword PROGRAM, it parses the same language as
- before. It's just a bit cleaner, and more importantly it's
- considerably more robust. I feel good about it.
-
- The next installment will be another digression: the discussion
- of semicolons and such that got me into this mess in the first
- place. THEN we'll press on into procedures and types. Hang in
- there with me. The addition of those features will go a long way
- towards removing KISS from the "toy language" category. We're
- getting very close to being able to write a serious compiler.
-
-
- TINY VERSION 1.1
-
-
- {--------------------------------------------------------------}
- program Tiny11;
-
- {--------------------------------------------------------------}
- { Constant Declarations }
-
- const TAB = ^I;
- CR = ^M;
- LF = ^J;
-
- LCount: integer = 0;A*A*
- - 12 -
- PAA
-
-
-
-
-
- NEntry: integer = 0;
-
-
- {--------------------------------------------------------------}
- { Type Declarations }
-
- type Symbol = string[8];
-
- SymTab = array[1..1000] of Symbol;
-
- TabPtr = ^SymTab;
-
-
- {--------------------------------------------------------------}
- { Variable Declarations }
-
- var Look : char; { Lookahead Character }
- Token: char; { Encoded Token }
- Value: string[16]; { Unencoded Token }
-
-
- const MaxEntry = 100;
-
- var ST : array[1..MaxEntry] of Symbol;
- SType: array[1..MaxEntry] of char;
-
-
- {--------------------------------------------------------------}
- { Definition of Keywords and Token Types }
-
- const NKW = 9;
- NKW1 = 10;
-
- const KWlist: array[1..NKW] of Symbol =
- ('IF', 'ELSE', 'ENDIF', 'WHILE', 'ENDWHILE',
- 'READ', 'WRITE', 'VAR', 'END');
-
- const KWcode: string[NKW1] = 'xileweRWve';
-
-
- {--------------------------------------------------------------}
- { Read New Character From Input Stream }
-
- procedure GetChar;
- begin
- Read(Look);
- end;
-
- {--------------------------------------------------------------}
- { Report an Error }
-
- procedure Error(s: string);
- begin
- WriteLn;A*A*
- - 13 -
- PAA
-
-
-
-
-
- WriteLn(^G, 'Error: ', s, '.');
- end;
-
-
- {--------------------------------------------------------------}
- { Report Error and Halt }
-
- procedure Abort(s: string);
- begin
- Error(s);
- Halt;
- end;
-
-
- {--------------------------------------------------------------}
- { Report What Was Expected }
-
- procedure Expected(s: string);
- begin
- Abort(s + ' Expected');
- end;
-
- {--------------------------------------------------------------}
- { Report an Undefined Identifier }
-
- procedure Undefined(n: string);
- begin
- Abort('Undefined Identifier ' + n);
- end;
-
-
- {--------------------------------------------------------------}
- { Report a Duplicate Identifier }
-
- procedure Duplicate(n: string);
- begin
- Abort('Duplicate Identifier ' + n);
- end;
-
-
- {--------------------------------------------------------------}
- { Check to Make Sure the Current Token is an Identifier }
-
- procedure CheckIdent;
- begin
- if Token <> 'x' then Expected('Identifier');
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize an Alpha Character }
-
- function IsAlpha(c: char): boolean;
- beginA*A*
- - 14 -
- PAA
-
-
-
-
-
- IsAlpha := UpCase(c) in ['A'..'Z'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize a Decimal Digit }
-
- function IsDigit(c: char): boolean;
- begin
- IsDigit := c in ['0'..'9'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize an AlphaNumeric Character }
-
- function IsAlNum(c: char): boolean;
- begin
- IsAlNum := IsAlpha(c) or IsDigit(c);
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize an Addop }
-
- function IsAddop(c: char): boolean;
- begin
- IsAddop := c in ['+', '-'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize a Mulop }
-
- function IsMulop(c: char): boolean;
- begin
- IsMulop := c in ['*', '/'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize a Boolean Orop }
-
- function IsOrop(c: char): boolean;
- begin
- IsOrop := c in ['|', '~'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize a Relop }
-
- function IsRelop(c: char): boolean;
- beginA*A*
- - 15 -
- PAA
-
-
-
-
-
- IsRelop := c in ['=', '#', '<', '>'];
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize White Space }
-
- function IsWhite(c: char): boolean;
- begin
- IsWhite := c in [' ', TAB, CR, LF];
- end;
-
-
- {--------------------------------------------------------------}
- { Skip Over Leading White Space }
-
- procedure SkipWhite;
- begin
- while IsWhite(Look) do
- GetChar;
- end;
-
-
- {--------------------------------------------------------------}
- { Table Lookup }
-
- function Lookup(T: TabPtr; s: string; n: integer): integer;
- var i: integer;
- found: Boolean;
- begin
- found := false;
- i := n;
- while (i > 0) and not found do
- if s = T^[i] then
- found := true
- else
- dec(i);
- Lookup := i;
- end;
-
-
- {--------------------------------------------------------------}
- { Locate a Symbol in Table }
- { Returns the index of the entry. Zero if not present. }
-
- function Locate(N: Symbol): integer;
- begin
- Locate := Lookup(@ST, n, NEntry);
- end;
-
-
- {--------------------------------------------------------------}
- { Look for Symbol in Table }A6A6
- - 16 -A*A*
- PAA
-
-
-
-
-
- function InTable(n: Symbol): Boolean;
- begin
- InTable := Lookup(@ST, n, NEntry) <> 0;
- end;
-
-
- {--------------------------------------------------------------}
- { Check to See if an Identifier is in the Symbol Table }
- { Report an error if it's not. }
-
-
- procedure CheckTable(N: Symbol);
- begin
- if not InTable(N) then Undefined(N);
- end;
-
-
- {--------------------------------------------------------------}
- { Check the Symbol Table for a Duplicate Identifier }
- { Report an error if identifier is already in table. }
-
-
- procedure CheckDup(N: Symbol);
- begin
- if InTable(N) then Duplicate(N);
- end;
-
-
- {--------------------------------------------------------------}
- { Add a New Entry to Symbol Table }
-
- procedure AddEntry(N: Symbol; T: char);
- begin
- CheckDup(N);
- if NEntry = MaxEntry then Abort('Symbol Table Full');
- Inc(NEntry);
- ST[NEntry] := N;
- SType[NEntry] := T;
- end;
-
-
- {--------------------------------------------------------------}
- { Get an Identifier }
-
- procedure GetName;
- begin
- SkipWhite;
- if Not IsAlpha(Look) then Expected('Identifier');
- Token := 'x';
- Value := '';
- repeat
- Value := Value + UpCase(Look);
- GetChar;
- until not IsAlNum(Look);A*A*
- - 17 -
- PAA
-
-
-
-
-
- end;
-
-
- {--------------------------------------------------------------}
- { Get a Number }
-
- procedure GetNum;
- begin
- SkipWhite;
- if not IsDigit(Look) then Expected('Number');
- Token := '#';
- Value := '';
- repeat
- Value := Value + Look;
- GetChar;
- until not IsDigit(Look);
- end;
-
-
- {--------------------------------------------------------------}
- { Get an Operator }
-
- procedure GetOp;
- begin
- SkipWhite;
- Token := Look;
- Value := Look;
- GetChar;
- end;
-
-
- {--------------------------------------------------------------}
- { Get the Next Input Token }
-
- procedure Next;
- begin
- SkipWhite;
- if IsAlpha(Look) then GetName
- else if IsDigit(Look) then GetNum
- else GetOp;
- end;
-
-
- {--------------------------------------------------------------}
- { Scan the Current Identifier for Keywords }
-
- procedure Scan;
- begin
- if Token = 'x' then
- Token := KWcode[Lookup(Addr(KWlist), Value, NKW) + 1];
- end;
-
-
- {--------------------------------------------------------------}A*A*
- - 18 -
- PAA
-
-
-
-
-
- { Match a Specific Input String }
-
- procedure MatchString(x: string);
- begin
- if Value <> x then Expected('''' + x + '''');
- Next;
- end;
-
-
- {--------------------------------------------------------------}
- { Output a String with Tab }
-
- procedure Emit(s: string);
- begin
- Write(TAB, s);
- end;
-
-
- {--------------------------------------------------------------}
- { Output a String with Tab and CRLF }
-
- procedure EmitLn(s: string);
- begin
- Emit(s);
- WriteLn;
- end;
-
-
- {--------------------------------------------------------------}
- { Generate a Unique Label }
-
- function NewLabel: string;
- var S: string;
- begin
- Str(LCount, S);
- NewLabel := 'L' + S;
- Inc(LCount);
- end;
-
-
- {--------------------------------------------------------------}
- { Post a Label To Output }
-
- procedure PostLabel(L: string);
- begin
- WriteLn(L, ':');
- end;
-
-
- {---------------------------------------------------------------}
- { Clear the Primary Register }
-
- procedure Clear;
- beginA*A*
- - 19 -
- PAA
-
-
-
-
-
- EmitLn('CLR D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Negate the Primary Register }
-
- procedure Negate;
- begin
- EmitLn('NEG D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Complement the Primary Register }
-
- procedure NotIt;
- begin
- EmitLn('NOT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Load a Constant Value to Primary Register }
-
- procedure LoadConst(n: string);
- begin
- Emit('MOVE #');
- WriteLn(n, ',D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Load a Variable to Primary Register }
-
- procedure LoadVar(Name: string);
- begin
- if not InTable(Name) then Undefined(Name);
- EmitLn('MOVE ' + Name + '(PC),D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Push Primary onto Stack }
-
- procedure Push;
- begin
- EmitLn('MOVE D0,-(SP)');
- end;
-
-
- {---------------------------------------------------------------}
- { Add Top of Stack to Primary }A6A6
- - 20 -A*A*
- PAA
-
-
-
-
-
- procedure PopAdd;
- begin
- EmitLn('ADD (SP)+,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Subtract Primary from Top of Stack }
-
- procedure PopSub;
- begin
- EmitLn('SUB (SP)+,D0');
- EmitLn('NEG D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Multiply Top of Stack by Primary }
-
- procedure PopMul;
- begin
- EmitLn('MULS (SP)+,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Divide Top of Stack by Primary }
-
- procedure PopDiv;
- begin
- EmitLn('MOVE (SP)+,D7');
- EmitLn('EXT.L D7');
- EmitLn('DIVS D0,D7');
- EmitLn('MOVE D7,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { AND Top of Stack with Primary }
-
- procedure PopAnd;
- begin
- EmitLn('AND (SP)+,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { OR Top of Stack with Primary }
-
- procedure PopOr;
- begin
- EmitLn('OR (SP)+,D0');
- end;A6A6
- - 21 -A*A*
- PAA
-
-
-
-
-
- {---------------------------------------------------------------}
- { XOR Top of Stack with Primary }
-
- procedure PopXor;
- begin
- EmitLn('EOR (SP)+,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Compare Top of Stack with Primary }
-
- procedure PopCompare;
- begin
- EmitLn('CMP (SP)+,D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was = }
-
- procedure SetEqual;
- begin
- EmitLn('SEQ D0');
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was != }
-
- procedure SetNEqual;
- begin
- EmitLn('SNE D0');
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was > }
-
- procedure SetGreater;
- begin
- EmitLn('SLT D0');
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was < }
-
- procedure SetLess;
- begin
- EmitLn('SGT D0');A*A*
- - 22 -
- PAA
-
-
-
-
-
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was <= }
-
- procedure SetLessOrEqual;
- begin
- EmitLn('SGE D0');
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Set D0 If Compare was >= }
-
- procedure SetGreaterOrEqual;
- begin
- EmitLn('SLE D0');
- EmitLn('EXT D0');
- end;
-
-
- {---------------------------------------------------------------}
- { Store Primary to Variable }
-
- procedure Store(Name: string);
- begin
- EmitLn('LEA ' + Name + '(PC),A0');
- EmitLn('MOVE D0,(A0)')
- end;
-
-
- {---------------------------------------------------------------}
- { Branch Unconditional }
-
- procedure Branch(L: string);
- begin
- EmitLn('BRA ' + L);
- end;
-
-
- {---------------------------------------------------------------}
- { Branch False }
-
- procedure BranchFalse(L: string);
- begin
- EmitLn('TST D0');
- EmitLn('BEQ ' + L);
- end;
-
-
- {---------------------------------------------------------------}A*A*
- - 23 -
- PAA
-
-
-
-
-
- { Read Variable to Primary Register }
-
- procedure ReadIt(Name: string);
- begin
- EmitLn('BSR READ');
- Store(Name);
- end;
-
-
- { Write from Primary Register }
-
- procedure WriteIt;
- begin
- EmitLn('BSR WRITE');
- end;
-
-
- {--------------------------------------------------------------}
- { Write Header Info }
-
- procedure Header;
- begin
- WriteLn('WARMST', TAB, 'EQU $A01E');
- end;
-
-
- {--------------------------------------------------------------}
- { Write the Prolog }
-
- procedure Prolog;
- begin
- PostLabel('MAIN');
- end;
-
-
- {--------------------------------------------------------------}
- { Write the Epilog }
-
- procedure Epilog;
- begin
- EmitLn('DC WARMST');
- EmitLn('END MAIN');
- end;
-
-
- {---------------------------------------------------------------}
- { Allocate Storage for a Static Variable }
-
- procedure Allocate(Name, Val: string);
- begin
- WriteLn(Name, ':', TAB, 'DC ', Val);
- end;ABAB
- - 24 -A*A*
- PAA
-
-
-
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Math Factor }
-
- procedure BoolExpression; Forward;
-
- procedure Factor;
- begin
- if Token = '(' then begin
- Next;
- BoolExpression;
- MatchString(')');
- end
- else begin
- if Token = 'x' then
- LoadVar(Value)
- else if Token = '#' then
- LoadConst(Value)
- else Expected('Math Factor');
- Next;
- end;
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize and Translate a Multiply }
-
- procedure Multiply;
- begin
- Next;
- Factor;
- PopMul;
- end;
-
-
- {-------------------------------------------------------------}
- { Recognize and Translate a Divide }
-
- procedure Divide;
- begin
- Next;
- Factor;
- PopDiv;
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Math Term }
-
- procedure Term;
- begin
- Factor;
- while IsMulop(Token) do begin
- Push;
- case Token ofA*A*
- - 25 -
- PAA
-
-
-
-
-
- '*': Multiply;
- '/': Divide;
- end;
- end;
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize and Translate an Add }
-
- procedure Add;
- begin
- Next;
- Term;
- PopAdd;
- end;
-
-
- {-------------------------------------------------------------}
- { Recognize and Translate a Subtract }
-
- procedure Subtract;
- begin
- Next;
- Term;
- PopSub;
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate an Expression }
-
- procedure Expression;
- begin
- if IsAddop(Token) then
- Clear
- else
- Term;
- while IsAddop(Token) do begin
- Push;
- case Token of
- '+': Add;
- '-': Subtract;
- end;
- end;
- end;
-
-
- {---------------------------------------------------------------}
- { Get Another Expression and Compare }
-
- procedure CompareExpression;
- begin
- Expression;A*A*
- - 26 -
- PAA
-
-
-
-
-
- PopCompare;
- end;
-
-
- {---------------------------------------------------------------}
- { Get The Next Expression and Compare }
-
- procedure NextExpression;
- begin
- Next;
- CompareExpression;
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate a Relational "Equals" }
-
- procedure Equal;
- begin
- NextExpression;
- SetEqual;
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate a Relational "Less Than or Equal" }
-
- procedure LessOrEqual;
- begin
- NextExpression;
- SetLessOrEqual;
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate a Relational "Not Equals" }
-
- procedure NotEqual;
- begin
- NextExpression;
- SetNEqual;
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate a Relational "Less Than" }
-
- procedure Less;
- begin
- Next;
- case Token of
- '=': LessOrEqual;
- '>': NotEqual;
- else beginA*A*
- - 27 -
- PAA
-
-
-
-
-
- CompareExpression;
- SetLess;
- end;
- end;
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate a Relational "Greater Than" }
-
- procedure Greater;
- begin
- Next;
- if Token = '=' then begin
- NextExpression;
- SetGreaterOrEqual;
- end
- else begin
- CompareExpression;
- SetGreater;
- end;
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Relation }
-
-
- procedure Relation;
- begin
- Expression;
- if IsRelop(Token) then begin
- Push;
- case Token of
- '=': Equal;
- '<': Less;
- '>': Greater;
- end;
- end;
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Boolean Factor with Leading NOT }
-
- procedure NotFactor;
- begin
- if Token = '!' then begin
- Next;
- Relation;
- NotIt;
- end
- else
- Relation;A*A*
- - 28 -
- PAA
-
-
-
-
-
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Boolean Term }
-
- procedure BoolTerm;
- begin
- NotFactor;
- while Token = '&' do begin
- Push;
- Next;
- NotFactor;
- PopAnd;
- end;
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize and Translate a Boolean OR }
-
- procedure BoolOr;
- begin
- Next;
- BoolTerm;
- PopOr;
- end;
-
-
- {--------------------------------------------------------------}
- { Recognize and Translate an Exclusive Or }
-
- procedure BoolXor;
- begin
- Next;
- BoolTerm;
- PopXor;
- end;
-
-
- {---------------------------------------------------------------}
- { Parse and Translate a Boolean Expression }
-
- procedure BoolExpression;
- begin
- BoolTerm;
- while IsOrOp(Token) do begin
- Push;
- case Token of
- '|': BoolOr;
- '~': BoolXor;
- end;
- end;
- end;A*A*
- - 29 -
- PAA
-
-
-
-
-
- {--------------------------------------------------------------}
- { Parse and Translate an Assignment Statement }
-
- procedure Assignment;
- var Name: string;
- begin
- CheckTable(Value);
- Name := Value;
- Next;
- MatchString('=');
- BoolExpression;
- Store(Name);
- end;
-
-
- {---------------------------------------------------------------}
- { Recognize and Translate an IF Construct }
-
- procedure Block; Forward;
-
- procedure DoIf;
- var L1, L2: string;
- begin
- Next;
- BoolExpression;
- L1 := NewLabel;
- L2 := L1;
- BranchFalse(L1);
- Block;
- if Token = 'l' then begin
- Next;
- L2 := NewLabel;
- Branch(L2);
- PostLabel(L1);
- Block;
- end;
- PostLabel(L2);
- MatchString('ENDIF');
- end;
-
-
- {--------------------------------------------------------------}
- { Parse and Translate a WHILE Statement }
-
- procedure DoWhile;
- var L1, L2: string;
- begin
- Next;
- L1 := NewLabel;
- L2 := NewLabel;
- PostLabel(L1);
- BoolExpression;
- BranchFalse(L2);
- Block;A*A*
- - 30 -
- PAA
-
-
-
-
-
- MatchString('ENDWHILE');
- Branch(L1);
- PostLabel(L2);
- end;
-
-
- {--------------------------------------------------------------}
- { Read a Single Variable }
-
- procedure ReadVar;
- begin
- CheckIdent;
- CheckTable(Value);
- ReadIt(Value);
- Next;
- end;
-
-
- {--------------------------------------------------------------}
- { Process a Read Statement }
-
- procedure DoRead;
- begin
- Next;
- MatchString('(');
- ReadVar;
- while Token = ',' do begin
- Next;
- ReadVar;
- end;
- MatchString(')');
- end;
-
-
- {--------------------------------------------------------------}
- { Process a Write Statement }
-
- procedure DoWrite;
- begin
- Next;
- MatchString('(');
- Expression;
- WriteIt;
- while Token = ',' do begin
- Next;
- Expression;
- WriteIt;
- end;
- MatchString(')');
- end;
-
-
- {--------------------------------------------------------------}
- { Parse and Translate a Block of Statements }A*A*
- - 31 -
- PAA
-
-
-
-
-
- procedure Block;
- begin
- Scan;
- while not(Token in ['e', 'l']) do begin
- case Token of
- 'i': DoIf;
- 'w': DoWhile;
- 'R': DoRead;
- 'W': DoWrite;
- else Assignment;
- end;
- Scan;
- end;
- end;
-
-
- {--------------------------------------------------------------}
- { Allocate Storage for a Variable }
-
- procedure Alloc;
- begin
- Next;
- if Token <> 'x' then Expected('Variable Name');
- CheckDup(Value);
- AddEntry(Value, 'v');
- Allocate(Value, '0');
- Next;
- end;
-
-
- {--------------------------------------------------------------}
- { Parse and Translate Global Declarations }
-
- procedure TopDecls;
- begin
- Scan;
- while Token = 'v' do
- Alloc;
- while Token = ',' do
- Alloc;
- end;
-
-
- {--------------------------------------------------------------}
- { Initialize }
-
- procedure Init;
- begin
- GetChar;
- Next;
- end;
-
-
- {--------------------------------------------------------------}A*A*
- - 32 -
- PAA
-
-
-
-
-
- { Main Program }
-
- begin
- Init;
- MatchString('PROGRAM');
- Header;
- TopDecls;
- MatchString('BEGIN');
- Prolog;
- Block;
- MatchString('END');
- Epilog;
- end.
- {--------------------------------------------------------------}AUAU
-
-
-
-
-
- AA
- - 33 -A*A*
- @