NetNews Usenet Archive 1992 #31

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #31 / NN_1992_31.iso / spool / comp / compiler / 2059 < prev next >

Wrap

Text File | 1992-12-22 | 8.8 KB | 164 lines

Newsgroups: comp.compilers Path: sparky!uunet!world!iecc!compilers-sender From: Dave Gillespie <daveg@thymus.synaptics.com> Subject: Re: Extension Languages Reply-To: Dave Gillespie <daveg@thymus.synaptics.com> Organization: Compilers Central Date: Sat, 19 Dec 1992 01:52:26 GMT Approved: compilers@iecc.cambridge.ma.us Message-ID: <92-12-091@comp.compilers> References: <92-12-087@comp.compilers> <92-12-064@comp.compilers> Keywords: design Sender: compilers-sender@iecc.cambridge.ma.us Lines: 149 Stavros Macrakis writes: > Any of these notations are "easy for the machine", although some of the > tables will be larger for infix notation. and Peter Ludemann writes: > 20-30% of the CPU time is spent in scanning for the next non-blank --- > something a parser for any language needs to do (this scanner loop is in > tightly coded assembler, by the way). As the standard method of handling > infix languages is to push things onto a stack, the cost of interpreting > an infix language should be the same as for prefix and postfix languages. These are valid points, but they're not quite what I was trying to get at in my original message. I guess I wasn't very clear about it. Let's try to figure out what we really mean by the terms we're using. "Post/pre/infix" can, as purely syntactic terms, be taken at two different levels. There's the actual placement of the operator in an expression involving two operands, which is admittedly a fairly minor issue, but there's also the more general linguistic philosophy that tends to accompany the three methods. I was trying to address the philosophies as well as the actual notations, since that's what the original poster seemed to be asking about. If a language is designed so that all functions or operators take a fixed number of arguments (or more generally, a self-delimiting number of arguments), and if every function/operator always evaluates its arguments except when explicit quoting is used, then it's possible to use a postfix syntax which allows for extremely easy evaluation: Each argument pushes its value onto the stack, so that when the operator is seen the arguments are already sitting there in the proper order, ready to be taken off and used. Forth and PostScript are good examples of such languages. Some languages want to allow functions which take variable numbers of arguments (without requiring explicit argument counts or terminator arguments), or they want to provide implicit quoting of certain arguments (where, for example, Lisp's "(setq x y)" quotes the first argument so that it can operate on the variable itself rather than on the variable's current value). Both of these can be accomplished by adding an explicit notation to delimit the argument lists, such as Lisp's parentheses. Experience seems to show that prefix notation works best with languages at this level, where pursuit of simplicity is still a major concern. We tend to think of these languages when we refer to "prefix languages." (Traditional "f(x,y)" function calls without any operators, like Mathematica's FullForm, also fit this description.) Both notations have the property that the natural intermediate form for storing a scanned-in program closely resembles the source form in structure. "Prefix" languages make the fine structure more explicit, so they tend to be better suited when programs want to operate on other programs as data. In an embedded language like Emacs Lisp, this has the advantage that many of the interpreter's algorithms can double as primitives available to the user, and also that much of the interpreter can be written in its own language rather than the (usually less concise) implementation language. Since we like our embedded languages to be small and unobtrusive, this is a big win. Also, since embedded languages have to cope with changing environments, the ability to operate on code is useful in its own right. If simplicity of the interpreter isn't so great a concern, a design may opt for a more complex language that more closely matches the concepts human programmers like to work with. This produces languages which tend to have a variety of infix, prefix, and postfix operators, plus various syntactic elements like commas, semicolons, braces, etc. When we refer to an extension language being "infix," I think we're usually thinking about these sorts of languages. The problem with "infix" languages is that because of their mixture of human-style implicit assumptions and redundant syntax, they tend to be unsuitable for direct use as intermediate languages, while also making the need for an intermediate language more important because the language is less tied to the constraints that make an implementation run fast. So except for the simplest BASIC interpreters, most "infix" languages compile to an intermediate form that's closer to a prefix or postfix language in nature, even in so-called "interpretive" systems. Stavros brings up Lisp M-expressions as an example, although I can't help noticing that the early Lispers gave up on M-expressions because S-expressions (that is, prefix notation) turned out to be easy enough to use directly. So infix systems generally parse the input into an intermediate form that is fairly far removed from the source language that humans work with. As long as only the compiler or interpreter ever see the intermediate form, this doesn't really matter. But in a language like Lisp where the user works with the intermediate form directly, there is a lot to be gained from having the intermediate form "look" just like the source form. It's a lot less confusing, as well as making the parser a whole lot simpler. Both Stavros and Peter point out that a more complicated parser doesn't necessarily have to be much slower than a simple parser, but it *does* tend to be a lot larger, and compactness of the interpreter is important for embedded languages. Stavros continues: > 1) You seem to assume that postfix syntax translates into difficulty of > accessing the operator. But the syntax doesn't tell you anything about > the internal representation. But my assumption was based on the desire to have the internal representation be close to the syntax. When I said postfix was slightly slower for Lisp-like languages, I meant that, given a direct mapping of syntactic order onto internal Lisp lists, postfix has a slight extra cost because the argument list must be scanned once to reach the operator before you can know how to evaluate the arguments. It was only meant to be a minor point, though. A system that stored argument vectors instead of lists, for example, would not have this problem. I'm pretty confident that prefix is more natural than postfix, all else being the same, as least to English readers. Because a Lisp operator can affect the way the arguments are interpreted, people reading left to right want to see the operator first. That seems like a much more compelling argument against postfix, unless you're going to go the whole postfix route and have no parentheses to delimit the arguments. > 2) Many machine-oriented languages are postfix, precisely because they are > EASIER to interpret. The problem comes only with non-strict operators > like "if", which such languages avoid by using explicit gotos. Or by quoting, as with PostScript's "/" and "{ }" notations. Postfix is indeed great if its utter minimalism is enough for you, but if you want to write large programs, or do anything other than simply interpret a stored program, it gets clumsy. (Even PostScript was designed to be written mainly by other programs, where the human-friendly structure goes in the C program that generates the PostScript rather than in the PostScript itself.) > I've also seen languages which are Lisp-like in essence, but with C-like > parsers added on top to make them more palatable. In other words, they > have a parser which reads "a+b" into the list "(+ a b)". I've even used this technique myself. But it does complicate the parser, and I can understand that such creature comforts might not seem worthwhile in systems (like Emacs) which don't envision major applications being written in the extension language (my own crazed efforts notwithstanding ;-). > Yes, there are Lisp-like languages which provide an infix syntax, or even > Lisps which happen to have an infix reader as well. But they are not > Lisp-like because they represent the program as a tree! After all, many > language processors represent the program as a tree at some point.... Right, they are Lisp-like because they represent the program as a *data structure of the language* which is a tree. A true Lisp fanatic might argue that your last point merely shows that other languages must convert themselves to Lisp in order to get anything done! -- Dave -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.