NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / ada / 3346 < prev next >

Wrap

Internet Message Format | 1992-11-24 | 3.6 KB

Path: sparky!uunet!elroy.jpl.nasa.gov!ames!network.ucsd.edu!rutgers!cmcl2!acf3!schonber From: schonber@acf3.NYU.EDU (Ed Schonberg) Newsgroups: comp.lang.ada Subject: Re: GNU-NYU Ada project Message-ID: <61990007@acf3.NYU.EDU> Date: 24 Nov 92 04:03:00 GMT References: <1992Nov19.184504.22639@coe.montana.edu> Sender: notes@cmcl2.nyu.edu (Notes Person) Organization: New York University Lines: 66 Nntp-Posting-Host: acf3.nyu.edu A few comments on the GNAT progress report. The figure of 1,000,000 lines per minute on a 16 Mhz PC is correct. It is the the performance of a syntax checker, i.e. the program does not build a parse tree, but does diagnose syntax errors (and recovers from them). This performance is due to several ingredients: a) the program is written in assembly language, with global register assignments that code generators are not able to duplicate. b) The program to be parsed is fully in memory, so the lexical scanner does not have to check for, nor perform, any I/O. c) The parser is not table driven. A parser that builds a syntax tree will have a more modest performance ( currently our Ada-written parser is processing about 20,000 lines/min). At this speed, the bulk of the compilation time is still spent on code generation. The parser is a top-down parser with backtracking. Robert Dewar chose this model first out of a personal preference, and second in order to compare its performance with that of an excellent LALR parser generator, built by Philippe Charles, and which does substantial table compression and has excellent error recovery. The top-down parser turns out to be faster by more than an order of magnitude (even after the relative speed of the lexical scanner is taken into account) and the error recovery is of comparable quality (i.e. no cascaded errors, and almost no spurious errors on the whole ACVC suite). Why are top-down parsers considered inferior? This is probably a result of the negative remarks made about them by Aho, Ullman and Sethi, who are (understandably) very fond of the admitedly elegant algorithms involved in LALR parsing. It is also due to the fact that top-down parsers require a closer understanding of the grammar of the language being parsed (not such a strange requirement if you want to write a compiler!) while table-driven parsers can be produced mechanically with a minimum of effort. Hand-written parsers clearly require greater craft. The Dragon book also shows that in some cases a TD parser will not even recognize the language you thought you were parsing, but this pathology is of purely theoretical import, and does not apply to existing programming languages, to the best of my knowledge (and everyone's experience). It is also not the case that LALR parsers can now be equipped with obviously superior error recovery: a top-down parser has a better idea of the state of the parse, and can find more easily a resynchronization point. This is certainly our experience. Finally, for a language such as Ada, with a relatively high density of keywords, the parse is most of the time deterministic (i.e. requires no backtracking, or a minimal look-ahead), and a only a handful of instructions are performed per token. The recognizing automaton can often be coded with gotos (a natural programming technique for FSA) with obvious performance gains. During language design, when the grammar changes rapidly, parser generators are of course indispensable. Once the grammar is frozen, a top-down parser is a reasonable choice. The difficult part is not parsing anyway, it's code generation and optimization! Ed Schonberg GNAT project New York University schonberg@nyu.edu