home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!olivea!pagesat!spssig.spss.com!uchinews!ellis!goer
- From: goer@ellis.uchicago.edu (Richard L. Goerwitz)
- Newsgroups: comp.lang.icon
- Subject: parser generator, part 1
- Message-ID: <1993Jan3.211757.28395@midway.uchicago.edu>
- Date: 3 Jan 93 21:17:57 GMT
- Sender: news@uchinews.uchicago.edu (News System)
- Reply-To: goer@midway.uchicago.edu
- Organization: University of Chicago
- Lines: 787
-
- Warning -
-
- Don't bother unpacking this file unless a) you know what a parser
- generator is, and b) you are interested in thinking about how one
- should look for Icon.
-
- -Richard Goerwitz
-
- ---- Cut Here and feed the following to sh ----
- #!/bin/sh
- # This is a shell archive (produced by shar 3.49)
- # To extract the files from this archive, save it to a file, remove
- # everything above the "!/bin/sh" line above, and type "sh file_name".
- #
- # made 01/03/1993 20:21 UTC by richard@zenu
- # Source directory /u/richard/Ibpag
- #
- # existing files will NOT be overwritten unless -c is specified
- # This format requires very little intelligence at unshar time.
- # "if test", "cat", "rm", "echo", "true", and "sed" may be needed.
- #
- # This is part 1 of a multipart archive
- # do not concatenate these parts, unpack them in order with /bin/sh
- #
- # This shar contains:
- # length mode name
- # ------ ---------- ------------------------------------------
- # 41902 -rw-r--r-- ibpag.icn
- # 18563 -r--r--r-- maketbls.icn
- # 9627 -r--r--r-- preproc.icn
- # 25415 -r--r--r-- itokens.icn
- # 3663 -r--r--r-- debugme.icn
- # 1190 -r--r--r-- errors.icn
- # 2062 -r--r--r-- slashupto.icn
- # 4314 -r--r--r-- rewrap.icn
- # 762 -r--r--r-- strip.icn
- # 1752 -rw-r--r-- Makefile.dist
- #
- if test -r _shar_seq_.tmp; then
- echo 'Must unpack archives in sequence!'
- echo Please unpack part `cat _shar_seq_.tmp` next
- exit 1
- fi
- # ============= ibpag.icn ==============
- if test -f 'ibpag.icn' -a X"$1" != X"-c"; then
- echo 'x - skipping ibpag.icn (File already exists)'
- rm -f _shar_wnt_.tmp
- else
- > _shar_wnt_.tmp
- echo 'x - extracting ibpag.icn (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'ibpag.icn' &&
- X############################################################################
- X#
- X# Name: ibpag.src
- X#
- X# Title: Icon-based parser generator
- X#
- X# Author: Richard L. Goerwitz
- X#
- X# Version: 1.21
- X#
- X############################################################################
- X#
- X#
- X# General Description:
- X#
- X# IBPAG is a simple tool for generating parsers from grammar
- X# specifications. Sounds pretty mundane, but those who have never
- X# used a parser generator will perhaps be surprised to find that this
- X# kind of tool forms the basis of most modern programming language
- X# implementations. Parser generators are also used in preprocessors,
- X# transducers, compilers, interpreters, calculators and in fact for
- X# just about any situation where some form of structured input needs
- X# to be converted into some form of structured output.
- X#
- X# IBPAG is not a full, working system. In particular, startup
- X# times for compiled IBPAG programs is very slow, since I can't find
- X# a good way of writing/reconstructing the necessary structures. For
- X# applications where a startup delay is not a problem, IBPAG will
- X# work fine. What IBPAG is supposed to be is a very preliminary
- X# crack at designing a parser generator for Icon. It really ought to
- X# be redone in C some day, and its error handling facilities need to
- X# be tested better. It also ought to be redone as an LALR (and not
- X# LR) parser, and some method of storing parse tables ought to be
- X# found that permits fast/easy reconstruction at run-time. This
- X# version has to be regarded as in beta testing. I doubt I'll ever
- X# take it, however, much further - at least in its present form.
- X#
- X#
- X# More technical description:
- X#
- X# Technically speaking, IBPAG is a preprocessor that accepts a
- X# Icon-like source file from the standard input containing grammar
- X# productions and actions, converts them into parsing tables and
- X# associated code, and then adds to this an LR parser, tokenizer,
- X# error handler, and a few debugging tools. Writes the combination
- X# to the standard output, along with the necessary action and goto
- X# tables. The tables are generated using a very general (but also
- X# very sluggish) LR(1) algorithm. IBPAG is not terribly practical
- X# for most micros. It was written as an excercize by me (a linguist
- X# with far too little CS training) while auditing a compiler course
- X# as a some-time diversion from many dissertation rewrites...
- X#
- X# Cycles and epsilon moves are handled correctly (to my
- X# knowledge). Shift-reduce conflicts are handled in the normal way
- X# (i.e. pick the rule with the highest priority, and, in cases where
- X# the priority is the same, check the associativities; flag
- X# reduce/reduce conflicts as errors, and for shift/reduce conflicts
- X# pick shift by default [shift/shift conflicts don't matter, since we
- X# shift in either case, and no reduction takes place]).
- X#
- X#
- X# Running IBPAG:
- X#
- X# Invoking IBPAG is very, very simple. There are just two
- X# (optional) command-line switches:
- X#
- X# ibpag [-d] [-v] < inputfile > outputfile
- X#
- X# Where the -d switch causes profuse debugging messages to be
- X# displayed, and the -v switch ("verbose") causes somewhat more
- X# restrained messages to be emitted. -d implies -v. Inputfile is an
- X# IBPAG source file (see below on its format). Outputfile is an LR
- X# parser with appropriate tables, generated from the specs contained
- X# in inputfile.
- X#
- X# For more information how to set up IBPAG files and incorporate
- X# them into larger Icon programs, see the section just below, "How to
- X# use IBPAG."
- X#
- X############################################################################
- X#
- X#
- X# How to use IBPAG
- X# ________________
- X#
- X#
- X# I. Prerequisites
- X#
- X# The only prerequisites to using IBPAG are a knowledge of how to
- X# write a basic LR-parsable context-free grammar and a familiarity
- X# with the Icon programming language. IBPAG source files use
- X# essentially the same syntax as Icon source files. In a couple of
- X# cases, though the semantics of are quite different. IBPAG is
- X# essentially just a preprocessor. IBPAG-specific constructs,
- X# therefore, should be seen, not as Icon expressions, but as
- X# directives to the preprocessor, specifying how to construct the
- X# actual output code.
- X#
- X# In the sections that follow, I will discuss how to set up an
- X# IBPAG file, i.e. how to declare a start symbol and write rules.
- X# The issue of the tokenizer will then be addressed, along with a
- X# treatment of epsilon, and two reserved tokens: $ and error.
- X# Finally the overall compatibility of IBPAG and Icon code will be
- X# discussed, as well as how to start the IBPAG parser from the Icon
- X# code. If you still have problems and/or questions, try reading the
- X# UNIX manual page on YACC. IBPAG and YACC have some superficial
- X# similarities that may be helpful in sorting out why IBPAG works the
- X# way it does. Alternatively write to me, Richard Goerwitz, via
- X# e-mail at goer@midway.uchicago.edu. I'll be glad to field
- X# questions and provide anyone with the latest version I have
- X# on-hand. Doubtless there will be many bugs to find, and I'll be
- X# happy to fix any that get reported. At some point, the entire
- X# table generating process (in maketbls.icn) will need to be ripped
- X# out, and replaced with one of the new, faster LR(1) (or perhaps
- X# LALR) algorithms.
- X#
- X#
- X# II. The start symbol
- X#
- X# All context-free grammars have a start-symbol, i.e. a terminal
- X# that can be considered the "goal" of a parse. Because cycles (and
- X# bugs) in the grammar can make it hard for IBPAG to determine what
- X# the start symbol is, it must be declared. This is done with a
- X# start-symbol declaration, which has the following form:
- X#
- X# <start-symbol-declaration> ::= start_symbol identifier
- X#
- X# If no start-symbol declaration appears in an IBPAG source file,
- X# IBPAG assumes S as the start symbol.
- X#
- X#
- X# III. Rule declarations
- X#
- X# The heart of the IBPAG source file is the rule declaration.
- X# Basically, rule declarations tell IBPAG the structure of the grammar
- X# it is to assume when generating a parser. They also tell it what
- X# rules to associate with what Icon code. Rule declarations look a
- X# lot like procedure declarations, except that they begin with "rule"
- X# instead of "procedure" and must specify a priority and
- X# associativity. For those with formalistic leanings, the format of
- X# the rule declaration is as follows (opt = optional):
- X#
- X# <rule-declaration> ::= <rule-header> ... end
- X# <rule-header> ::= rule <priority> <associativity> \
- X# <left-hand-symbol> ( <right-hand-symbol-list> )
- X# <priority> ::= integer-literal (opt)
- X# <associativity> ::= left | right | none (opt)
- X# <left-hand-side> ::= identifier
- X# <right-hand-side> ::= <symbol-spec> | <symbol-spec>,<right-hand-side>
- X# <symbol-spec> ::= <symbol> | <symbol> \| <symbol-spec>
- X# <symbol> ::= identifier | string-literal
- X#
- X# The "..." indicates material that is precisely the same as for a
- X# standard Icon procedure declaration. The only difference between a
- X# rule and a procedure (syntactically, that is) is in the header.
- X# Superficially, rule and procedure headers look a lot alike. The
- X# rule headers, though, have extra elements (optional priority and
- X# associativity). They also interpret arguments in the option list
- X# quite differently than Icon does. In particular, these arguments
- X# are either strings, identifiers, or any combination of these
- X# separated by a bar. Strings correspond to terminals, identifiers
- X# to nonterminals. The bar will be discussed below.
- X#
- X# For those with a more practical bent, here is a sample rule
- X# definition:
- X#
- X# rule 1 none S(NP, VP)
- X# return ["S", arg1, arg2]
- X# end
- X#
- X# The 1 denotes the priority (here very low), while "none" specifies
- X# the associativity (can be right, left, or none). S is the
- X# left-hand side of the rule, and NP and VP are the (nonterminal)
- X# symbols that make up its right-hand side. Normally, the priority
- X# is not needed (the default is 1). In fact, it is best to try to
- X# write grammars without any priorities or associativity rules. The
- X# default associativity is none.
- X#
- X# Please note the body of the rule itself. The code there is
- X# triggered whenever an S is found (i.e. when a reduction via the
- X# above S-rule occurs during the parse). For those who know YACC,
- X# arg1 is equivalent to $1 and the return expression is equivalent to
- X# $$ =. Essentially, the above rule definition says that an S is
- X# composed of an NP and a VP, and that if an S is encountered, the
- X# parser is to create a list having three members, the first being
- X# the string "S", the second being the result that was returned when
- X# the NP was found, and the third being the result returned when the
- X# VP was encountered. Note that the above rule implies the existence
- X# of at least two other rules having NP and VP, respectively, as
- X# their left-hand sides. For example:
- X#
- X# rule NP("det", "noun")
- X# return ["NP", arg1, arg2]
- X# end
- X#
- X# rule VP("v", NP)
- X# return ["VP", arg1, arg2]
- X# end
- X#
- X# If two or more right-hand sides differ in just a single symbol,
- X# it is acceptable to combine them into one declaration using a
- X# vertical slash or bar. Although this bar looks like the usual Icon
- X# alternation operator, in this context IBPAG uses it as a macro
- X# device. In the following case, two rules are generated, each with
- X# one of the alternatives, "that" or "which," in its RHS. The same
- X# code is triggered by a reduction via either rule:
- X#
- X# rule RELCL("that"|"which", S)
- X# return ["RELCL", arg1, arg2]
- X# end
- X#
- X#
- X# IV. The tokenizer
- X#
- X# Note that in the above examples, "v," "det," "noun," "that", and
- X# "which" are considered terminal symbols. The user must, in fact,
- X# create a lexical analyzer called iparse_tokens that returns these
- X# symbols. The precise form in which iparse_tokens must function is
- X# as follows:
- X#
- X# iparse_tokens: file -> TOK records
- X# (stream) -> Ts
- X# Where stream is an open file, and Ts are TOK records.
- X# As a side-effect, iparse_tokens should increment the global
- X# variable line_number (once right away, then once for each
- X# newline encountered, or anything equivalent thereto)
- X#
- X# The -> notation above implies that iparse_tokens is a pure
- X# function. I just can't think of a good notation for generators.
- X# Iparse_tokens in fact *must be* a generator. TOK records contain
- X# the terminal symbol names in their first field. If the literal
- X# string value of the symbols is to be used by any of the rule code,
- X# it should be stored in the second field of the TOK record. A
- X# typical value returned might be:
- X#
- X# TOK("v", "eat")
- X#
- X# YACC/LEX's heterogenous yylval/return (TOKEN) strategy is really
- X# quite inelegant, and although my system isn't perfect, it avoids
- X# the global variable, and hence makes running simultaneous parsers a
- X# bit simpler undertaking (the only globals to worry about in IBPAG
- X# are line_number, errors, and fail_on_error, all of which can, if
- X# desired, be ignored).
- X#
- X# If you want line numbers to be output with any error messages
- X# that the parser might report, then make sure that the tokenizer
- X# keeps track of how many lines it has read by incrementing
- X# line_number. The parser itself automatically initializes this
- X# (global) variable to 0, so all you'll normally have to do is
- X# increment it by one for each newline the tokenizer encounters. If
- X# you leave line_number alone, the parser simply does not print out
- X# line numbers for errors it reports (which is somewhat less helpful,
- X# but workable nonetheless).
- X#
- X#
- X# V. Epsilon
- X#
- X# If needed, epsilon (i.e. the empty string) can be specified by
- X# an empty Icon string (e.g. rule REL(""|"that")). "" is a perfectly
- X# legal token. It is absolutely vital, however, to note that
- X# *epsilon is never pushed onto the value stack*. Hence in the above
- X# instance, if arg1 were to be accessed, an error would result if a
- X# reduction via REL -> epsilon (rather than by REL -> "that")
- X# occurred! For this reason it is wise to keep epsilon isolated
- X# (e.g. rule 10 REL("") and rule 11 REL("that"); note that giving
- X# both rules an associativity probably isn't necessary here, although
- X# giving the second rule the higher priority often *will* be). Note
- X# also that the tokenizer must never return an empty string as a
- X# token (i.e. TOK("")). Not only will the parser refuse to place it
- X# on the value stack, but it will not even recognize it as a valid
- X# lookahead symbol (since, by definition, it doesn't contain
- X# anything).
- X#
- X#
- X# VI. Reserved tokens ($ and error)
- X#
- X# Besides epsilon, the only string (qua symbol) the tokenizer
- X# cannot return at will is a TOK with either "error" or "$" as its
- X# first field (e.g. TOK("$")). The $ symbol denotes end-of-input,
- X# and *must* be returned by the tokenizer when the input stream ends
- X# (or when, for some other reason, the token stream has stopped). I
- X# could have cut the code so that failure or &null signalled the same
- X# thing, but I prefer to use failure of the tokenizer to signal an
- X# unexpected end of input, and I don't like the inconsistency of
- X# having &null be a valid token. The bottom line is that the
- X# tokenizer must return TOK("$") on end-of-input. You'll get used to
- X# it :-).
- X#
- X# The "error" terminal is reserved for error productions. If a
- X# syntax error occurs, and the token "error" exists in the grammar,
- X# the parser will attempt to back up as one state to see if any of
- X# them permits "error" as a valid lookahead symbol. If so, the
- X# parser jumps to the state that would have resulted if the parser
- X# had encountered "error" at that point. What this does is allow us
- X# to write error-recovery rules:
- X#
- X# global error_count
- X# rule EXPR("(", EXPR, ")")
- X# return arg1
- X# end
- X# rule REGEXP("(", REGEXP, "error")
- X# write(&errout, "unmatched \"(\"; resynchronizing")
- X# # Pretend we got a right parenthesis, and continue
- X# # processing.
- X# return arg1
- X# end
- X#
- X# So what happens if we get an expression without an enclosing right
- X# parenthesis? We get an error message, an increment of the error
- X# count, and a resynchronized parser that's ready to go (and possibly
- X# find more errors). This mechanism is very much like the mechanism
- X# YACC uses.
- X#
- X# Note that IBPAG will only pop one state off of the stack in
- X# efforts to find a state for which "error" is a legitimate lookahead
- X# token; any more than this, and we could easily back so far out the
- X# current production that we would end up causing confusion about
- X# just where the error occurred! If you write error productions,
- X# make sure that the error terminal is at most one token removed from
- X# the spot where you expect errors to occur.
- X#
- X#
- X# VII. IBPAG-Icon code compatibility (plus gotchas)
- X#
- X# Regular Icon procedures may be freely intermingled with IBPAG
- X# rules, so you can include iparse_tokens() right in the same file as
- X# your rule definitions. Note that rules, other than the header, in
- X# fact, behave somewhat like procedures in the sense that when a rule
- X# is triggered, the parser executes the code for that rule just the
- X# way code for a procedure is executed when it is invoked. The main
- X# difference between Icon procedures and rules is that the rules can
- X# only return one value or fail. If you want a rule to be able to
- X# produce more than one value, then have it return a coexpression.
- X# Failure in a rule signals that the entire RHS of the production
- X# should be discarded, and the stack should be returned to the state
- X# it was in before that RHS's first symbol was shifted onto the
- X# stack. Failure is really just another way of providing the user
- X# more control over error handling and recovery. If you have nothing
- X# useful to return, but do *not* want to signal an error, BE SURE TO
- X# RETURN A NULL DESCRIPTOR ("return &null")!
- X#
- X#
- X# VII. The parser itself
- X#
- X# In the output file, the parser that IBPAG constructs is called
- X# iparse(). You should invoke it somewhere to get the parser
- X# started. It takes a single argument: A stream (i.e. an open file).
- X# You could pass it a string, theoretically, instead of a file.
- X# Iparse() passes its first argument to iparse_tokens(), so be sure
- X# iparse_tokens() understands that it is working on a string instead
- X# of a file if you choose to do things this way. IBPAG's output is
- X# compressed somewhat, but the parser and error handler are human-
- X# readable. Please look at them and modify them as needed. Note
- X# especially the debugging facilities that are included with them
- X# (e.g. dump_lists()).
- X#
- X#
- X# VIII. Miscellaneous
- X#
- X# As it encounters errors, the parser, iparse(), increments the
- X# global variable errors by one. Normally one botched production
- X# will cause a series of cascading errors. As a result, the error
- X# handler only increments "errors" for the first in any unbroken
- X# series of errors. When (and if) the parser manages to
- X# resynchronize itself, and the cascade terminates, the errors
- X# variable is once again incremented normally, once for each error
- X# encountered.
- X#
- X# If no error processing is desired at all, you may set the
- X# global variable fail_on_error to something other than &null. This
- X# tells iparser's error handler simply to fail if there are any
- X# syntax errors. Note that "error" productions (described above, in
- X# the section on reserved tokens) will still work if they have been
- X# used in the grammar. So in fact, setting fail_on_errors will cause
- X# failure to occur for all errors that cannot be handled by
- X# user-defined error productions. What this does is bypass the
- X# built-in error detection and resynchronization routines. This may
- X# be useful if an IBPAG file is part of a much larger Icon program
- X# that needs to have errors registered internally, and not spit out
- X# separately to stderr by a renegade parser :-).
- X#
- X############################################################################
- X#
- X# Below is a sample IBPAG source file. It's kind of a hack, but
- X# I'm sure it will adequately illustrate the basic principles:
- X#
- X# link radcon
- X# global ID_tbl
- X#
- X# procedure main()
- X# iparse(&input)
- X# end
- X#
- X# rule 0 left S(stream)
- X# return
- X# end
- X#
- X# rule 1 left stream(calc, stream)
- X# return
- X# end
- X#
- X# rule 1 left stream(calc)
- X# return
- X# end
- X#
- X# rule 2 none calc("CR")
- X# return
- X# end
- X#
- X# rule 3 left calc(expr, "CR")
- X# # Radcon can't handle reals, so this doesn't really work.
- X# line_number +:= 1
- X# return write(radcon(arg1, \(\ID_tbl)["ibase"] | 10,
- X# \(\ID_tbl)["obase"] | 10))
- X# end
- X#
- X# rule 4 none expr("ID")
- X# if not (return \ID_tbl[arg1])
- X# then write(&errout, "uninitialized variable, line ", line_number)
- X# fail
- X# end
- X#
- X# rule 6 right expr("ID", "=", expr)
- X# initial ID_tbl := table()
- X# ID_tbl[arg1] := arg3
- X# return arg3
- X# end
- X#
- X# rule 10 left expr(expr, "+", expr)
- X# return arg1 + arg3
- X# end
- X#
- X# rule 10 left expr(expr, "-", expr)
- X# return arg1 - arg3
- X# end
- X#
- X# rule 11 left expr(expr, "*", expr)
- X# return arg1 * arg3
- X# end
- X#
- X# rule 11 left expr(expr, "/", expr)
- X# return arg1 / arg3
- X# end
- X#
- X# rule 11 left expr(expr, "%", expr)
- X# return arg1 % arg3
- X# end
- X#
- X# rule 12 left expr(expr, "^", expr)
- X# return arg1 ^ arg3
- X# end
- X#
- X# rule 50 none expr("(", expr, ")")
- X# return arg2
- X# end
- X#
- X# rule 75 right expr("-", expr)
- X# return -arg2
- X# end
- X#
- X# rule 75 right expr("+", expr)
- X# return +arg2
- X# end
- X#
- X# rule 100 none expr("NUM")
- X# if find(".", arg1)
- X# then return real(arg1)
- X# else return integer(arg1)
- X# end
- X#
- X# #
- X# # iparse_tokens: file -> TOK records (a generator)
- X# # stream -> tokens
- X# #
- X# # Where file is an open input stream, and tokens are TOK
- X# # records holding both the token type and actual token text.
- X# #
- X# # TOK records contain two parts, a preterminal symbol (the first
- X# # "sym" field), and the actual text of the token ("str"). The
- X# # parser above only pays attention to the sym field, although the
- X# # user might well want to use the actual text.
- X# #
- X# procedure iparse_tokens(stream)
- X#
- X# local token, c, whitespace, operators
- X# #global line_number
- X# whitespace := '\r\t \n'
- X# operators := '+-*/^()='
- X#
- X# token := ""
- X# every c := !(!&input || "\n") do {
- X# if not any(whitespace ++ operators, c) then {
- X# token ||:= c
- X# } else {
- X# if integer(token)
- X# then suspend TOK("NUM", "" ~== token)
- X# else suspend TOK("ID", "" ~== token)
- X# if any(operators, c) then
- X# suspend TOK(c)
- X# else {
- X# if c == "\n" then {
- X# line_number +:= 1
- X# suspend TOK("CR"|"CR")
- X# }
- X# }
- X# token := ""
- X# }
- X# }
- X# if integer(token)
- X# then suspend TOK("NUM", "" ~== token)
- X# else suspend TOK("ID", "" ~== token)
- X# suspend TOK("CR"|"$")
- X#
- X# end
- X#
- X############################################################################
- X#
- X# Links: options.icn
- X#
- X# See also: maketbls.icn, preproc.icn
- X#
- X############################################################################
- X
- Xlink options, structs, ximage
- X
- Xglobal DEBUG, VERBOSE
- X# For other record declarations, see maketbls.icn.
- Xrecord RE(state, procname, sym, size)
- Xrecord SH(state)
- Xrecord AC()
- X
- Xprocedure main(a)
- X
- X local f, f2, lname, opttbl, usage, tlst, sym, seen,
- X act, elt, size, r, i, k
- X # global ptbl, alst, glst
- X
- X usage := "usage: ibpag [-d] [-v] [-f input-file]"
- X
- X opttbl := options(a, "dvf:") | stop(usage)
- X if DEBUG := \opttbl["d"]
- X then VERBOSE := 1
- X else VERBOSE := opttbl["v"]
- X f := \opttbl["f"] | &input
- X f2 := &output
- X
- X ptbl := table()
- X makeptbl(f, f2)
- X
- X # set up (global) alst and glst (the action and goto lists)
- X CONST_TABLE()
- X
- X #
- X # Set up tlst (a set of all tokens used in the grammar), then
- X # turn the ACT records into RE, SH, or AC records.
- X #
- X tlst := set()
- X # squeeze a list of rules out of the action list; find the
- X # terminals in their right-hand sides
- X every sym := !\(\(!(!alst)).by_rule).RHS do
- X \sym.terminal & insert(tlst, sym.str)
- X insert(tlst, "$")
- X
- X #
- X # Clobber all of the ACT records, and turn them into RE, SH, or AC
- X # records. Get rid of structurally identical, structures by
- X # making them all point to the same structure.
- X #
- X if \VERBOSE then
- X write(&errout, "Merging duplicate structures in action list...")
- X seen := set()
- X every i := 1 to *alst do {
- X if /alst[i] then next
- X if *alst[i] = 0 then {
- X alst[i] := &null
- X next
- X }
- X every k := key(alst[i]) do {
- X if alst[i][k].str == "reduce" then {
- X r := alst[i][k].by_rule
- X size := *r.RHS
- X # Don't count epsilon moves when calculating sizes!
- X every elt := !r.RHS do
- X if elt.str == "" & \elt.terminal then size -:= 1
- X act := RE(alst[i][k].state, r.procname, r.LHS, size)
- X }
- X # Don't need to store all this info for shifts or accepts.
- X else if alst[i][k].str == "shift"
- X then act := SH(alst[i][k].state)
- X else act := AC()
- X if alst[i][k] := equiv(act, !seen)
- X then next
- X else {
- X alst[i][k] := act
- X insert(seen, act)
- X }
- X }
- X # Simple optimization; if alst[i] is structured identically to
- X # a previously seen table, use the previously seen one! Note
- X # that equiv is more general than Equiv (it can accept sets
- X # and tables). See the IPL for equiv(). Equiv() is in
- X # debug.src.
- X if alst[i] := equiv(alst[i], !seen)
- X then next else insert(seen, alst[i])
- X }
- X
- X write(f2, "")
- X write(f2, "############################################################################")
- X write(f2, "#")
- X write(f2, "# The following code has been inserted by IBPAG (Icon-based")
- X write(f2, "# Parser Generator). It contains a stack-based LR parser, an")
- X write(f2, "# error handler, and a debugging tool. The user must supply a")
- X write(f2, "# tokenizing routine called iparse_tokens().")
- X write(f2, "#")
- X write(f2, "# For a description of IBPAG source-file format, see the")
- X write(f2, "# documentation prepended to ibpag.icn.")
- X write(f2, "#")
- X write(f2, "############################################################################")
- X write(f2, "#")
- X write(f2, "# See also: ibpag.icn, maketbls.icn, preproc.icn, itokens.icn")
- X write(f2, "#")
- X write(f2, "############################################################################")
- X write(f2, "")
- X write(f2, "# uncomment for the compiler")
- X write(f2, "# invocable \"symbol\", \"TOK\", \"RE\", \"SH\", \"AC\"")
- X write(f2, "")
- X write(f2, "# I use ximage for debugging; remove it if you don't need it.")
- X write(f2, "link codeobj#, ximage")
- X write(f2, "")
- X write(f2, "record symbol(str, terminal)")
- X write(f2, "record TOK(sym, str)")
- X write(f2, "record RE(state, procname, sym, size)")
- X write(f2, "record SH(state)")
- X write(f2, "record AC()")
- X write(f2, "")
- X write(f2, "global line_number, errors, fail_on_error")
- X write(f2, "")
- X write(f2, "#")
- X write(f2, "# iparse: file -> ?")
- X write(f2, "# stream -> ?")
- X write(f2, "#")
- X write(f2, "# Where stream is an open file, and ? represents the user-defined")
- X write(f2, "# result of a completed parse of file, from the current location")
- X write(f2, "# up to the point where the parser executes an \"accept\" action.")
- X write(f2, "#")
- X write(f2, "# The second to fifth arguments are used on recursive calls from")
- X write(f2, "# the error handler, iparse_error. Ignore them, unless you are")
- X write(f2, "# sure of what you are doing!")
- X write(f2, "#")
- X write(f2, "procedure iparse(stream, state_stack, value_stack, next_token, err_state)")
- X write(f2, "")
- X write(f2, " local start_symbol, token, act, last_symbol, arglist")
- X write(f2, " static alst, glst")
- X write(f2, " #global line_number, errors")
- X write(f2, " initial {")
- X every lname := "alst" |"glst" do {
- X encode(variable(lname)) ? {
- X writes(f2, "\t", lname, " := decode(\"")
- X if write(f2, move(47), "_") then {
- X while write(f2, "\t ",move(60), "_")
- X write(f2, "\t ", tab(0), "\")")
- X }
- X else write(f2, tab(0), "\")")
- X }
- X }
- X write(f2, "\t#")
- X write(f2, "\t# Uncomment the following if you want a look at the state and goto")
- X write(f2, "\t# tables. If you aren't planning on looking at them, find the")
- X write(f2, "\t# procedure definition for dump_lists below, and delete it. Why")
- X write(f2, "\t# keep it around if it's not being used?")
- X write(f2, "\t#")
- X write(f2, "\t# dump_lists(&errout, alst, glst)")
- X write(f2, " }")
- X write(f2, " #")
- X write(f2, " # New, not recursive, invocation; reset stacks, line number and")
- X write(f2, " # error count.")
- X write(f2, " #")
- X write(f2, " start_symbol := ", image(start_symbol))
- X write(f2, " /err_state := 0")
- X write(f2, " /state_stack := [1] & line_number := 0 & errors := 0")
- X write(f2, " /value_stack := []")
- X write(f2, " /next_token := create iparse_tokens(stream)")
- X write(f2, "")
- X write(f2, " # &trace := -1")
- X write(f2, "")
- X write(f2, " while token := @next_token do {")
- X write(f2, "\trepeat {")
- X write(f2, "\t act := \\alst[state_stack[1]][token.sym] | {")
- X write(f2, "\t\t#")
- X write(f2, "\t\t# You can replace this error handler with whatever you")
- X write(f2, "\t\t# like, and have it do whatever you like.")
- X write(f2, "\t\t#")
- X write(f2, "\t\treturn iparse_error(alst, state_stack, value_stack,")
- X write(f2, "\t\t\t\t token, next_token, err_state + 1)")
- X write(f2, "\t }")
- X write(f2, "")
- X write(f2, "\t # If we're not working on an error production, assume that")
- X write(f2, "\t # we have a fully (re)synchronized parser, & clear err_state. ")
- X write(f2, "\t #")
- X write(f2, "\t (\\last_symbol | token.sym) == \"error\" | { err_state := 0 }")
- X write(f2, "\t last_symbol := token.sym")
- X write(f2, "")
- X write(f2, "\t case type(act) of {")
- X write(f2, "\t \"SH\" : {")
- X write(f2, "\t\t # push the next state onto the state stack")
- X write(f2, "\t \t push(state_stack, act.state)")
- X write(f2, "\t\t # push the current token's literal value onto the")
- X write(f2, "\t\t # value stack")
- X write(f2, "\t\t push(value_stack, token.str)")
- X write(f2, "\t\t # break out of enclosing repeat loop")
- X write(f2, "\t\t break")
- X write(f2, "\t }")
- X write(f2, "\t \"RE\" : {")
- X write(f2, "\t \t arglist := []")
- X write(f2, "\t\t #")
- X write(f2, "\t\t # Pop as many elements off of the token stack as")
- X write(f2, "\t\t # there are symbols in the right-hand side of the")
- X write(f2, "\t\t # rule. Push these elements onto an argument list.")
- X write(f2, "\t\t #")
- X write(f2, "\t\t every 1 to act.size do {")
- X write(f2, "\t\t pop(state_stack)")
- X write(f2, "\t\t push(arglist, pop(value_stack))")
- X write(f2, "\t\t }")
- X write(f2, "\t\t #")
- X write(f2, "\t\t # Check to goto list to see what state we should")
- SHAR_EOF
- true || echo 'restore of ibpag.icn failed'
- fi
- echo 'End of part 1'
- echo 'File ibpag.icn is continued in part 2'
- echo 2 > _shar_seq_.tmp
- exit 0
- --
-
- -Richard L. Goerwitz goer%midway@uchicago.bitnet
- goer@midway.uchicago.edu rutgers!oddjob!ellis!goer
-