home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!elroy.jpl.nasa.gov!ames!network.ucsd.edu!rutgers!cmcl2!acf3!schonber
- From: schonber@acf3.NYU.EDU (Ed Schonberg)
- Newsgroups: comp.lang.ada
- Subject: Re: GNU-NYU Ada project
- Message-ID: <61990007@acf3.NYU.EDU>
- Date: 24 Nov 92 04:03:00 GMT
- References: <1992Nov19.184504.22639@coe.montana.edu>
- Sender: notes@cmcl2.nyu.edu (Notes Person)
- Organization: New York University
- Lines: 66
- Nntp-Posting-Host: acf3.nyu.edu
-
- A few comments on the GNAT progress report.
-
- The figure of 1,000,000 lines per minute on a 16 Mhz PC is correct. It
- is the the performance of a syntax checker, i.e. the program does not
- build a parse tree, but does diagnose syntax errors (and recovers from
- them). This performance is due to several ingredients:
-
- a) the program is written in assembly language, with global register
- assignments that code generators are not able to duplicate.
-
- b) The program to be parsed is fully in memory, so the lexical scanner
- does not have to check for, nor perform, any I/O.
-
- c) The parser is not table driven.
-
- A parser that builds a syntax tree will have a more modest performance
- ( currently our Ada-written parser is processing about 20,000
- lines/min). At this speed, the bulk of the compilation time is still spent
- on code generation.
-
- The parser is a top-down parser with backtracking. Robert Dewar chose
- this model first out of a personal preference, and second in order to
- compare its performance with that of an excellent LALR parser generator,
- built by Philippe Charles, and which does substantial table compression
- and has excellent error recovery. The top-down parser turns out to be
- faster by more than an order of magnitude (even after the relative
- speed of the lexical scanner is taken into account) and the error
- recovery is of comparable quality (i.e. no cascaded errors, and almost
- no spurious errors on the whole ACVC suite).
-
- Why are top-down parsers considered inferior? This is probably a result
- of the negative remarks made about them by Aho, Ullman and Sethi, who
- are (understandably) very fond of the admitedly elegant algorithms
- involved in LALR parsing. It is also due to the fact that top-down
- parsers require a closer understanding of the grammar of the language
- being parsed (not such a strange requirement if you want to write a
- compiler!) while table-driven parsers can be produced mechanically with
- a minimum of effort. Hand-written parsers clearly require greater craft.
- The Dragon book also shows that in some cases a TD parser will not even
- recognize the language you thought you were parsing, but this pathology
- is of purely theoretical import, and does not apply to existing
- programming languages, to the best of my knowledge (and everyone's
- experience).
-
- It is also not the case that LALR parsers can now be
- equipped with obviously superior error recovery: a top-down parser has a
- better idea of the state of the parse, and can find more easily a
- resynchronization point. This is certainly our experience.
-
- Finally, for a language such as Ada, with a relatively high density of
- keywords, the parse is most of the time deterministic (i.e. requires no
- backtracking, or a minimal look-ahead), and a only a handful of instructions
- are performed per token. The recognizing automaton can often be coded
- with gotos (a natural programming technique for FSA) with obvious
- performance gains.
-
- During language design, when the grammar changes rapidly, parser
- generators are of course indispensable. Once the grammar is frozen,
- a top-down parser is a reasonable choice. The difficult part is not
- parsing anyway, it's code generation and optimization!
-
- Ed Schonberg
- GNAT project
- New York University
-
- schonberg@nyu.edu
-