home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compilers
- Path: sparky!uunet!world!iecc!compilers-sender
- From: Dave Gillespie <daveg@thymus.synaptics.com>
- Subject: Re: Extension Languages
- Reply-To: Dave Gillespie <daveg@thymus.synaptics.com>
- Organization: Compilers Central
- Date: Sat, 19 Dec 1992 01:52:26 GMT
- Approved: compilers@iecc.cambridge.ma.us
- Message-ID: <92-12-091@comp.compilers>
- References: <92-12-087@comp.compilers> <92-12-064@comp.compilers>
- Keywords: design
- Sender: compilers-sender@iecc.cambridge.ma.us
- Lines: 149
-
-
- Stavros Macrakis writes:
- > Any of these notations are "easy for the machine", although some of the
- > tables will be larger for infix notation.
-
- and Peter Ludemann writes:
- > 20-30% of the CPU time is spent in scanning for the next non-blank ---
- > something a parser for any language needs to do (this scanner loop is in
- > tightly coded assembler, by the way). As the standard method of handling
- > infix languages is to push things onto a stack, the cost of interpreting
- > an infix language should be the same as for prefix and postfix languages.
-
- These are valid points, but they're not quite what I was trying to get at
- in my original message. I guess I wasn't very clear about it. Let's try
- to figure out what we really mean by the terms we're using.
-
- "Post/pre/infix" can, as purely syntactic terms, be taken at two different
- levels. There's the actual placement of the operator in an expression
- involving two operands, which is admittedly a fairly minor issue, but
- there's also the more general linguistic philosophy that tends to
- accompany the three methods. I was trying to address the philosophies as
- well as the actual notations, since that's what the original poster seemed
- to be asking about.
-
- If a language is designed so that all functions or operators take a fixed
- number of arguments (or more generally, a self-delimiting number of
- arguments), and if every function/operator always evaluates its arguments
- except when explicit quoting is used, then it's possible to use a postfix
- syntax which allows for extremely easy evaluation: Each argument pushes
- its value onto the stack, so that when the operator is seen the arguments
- are already sitting there in the proper order, ready to be taken off and
- used. Forth and PostScript are good examples of such languages.
-
- Some languages want to allow functions which take variable numbers of
- arguments (without requiring explicit argument counts or terminator
- arguments), or they want to provide implicit quoting of certain arguments
- (where, for example, Lisp's "(setq x y)" quotes the first argument so that
- it can operate on the variable itself rather than on the variable's
- current value). Both of these can be accomplished by adding an explicit
- notation to delimit the argument lists, such as Lisp's parentheses.
- Experience seems to show that prefix notation works best with languages at
- this level, where pursuit of simplicity is still a major concern. We tend
- to think of these languages when we refer to "prefix languages."
- (Traditional "f(x,y)" function calls without any operators, like
- Mathematica's FullForm, also fit this description.)
-
- Both notations have the property that the natural intermediate form for
- storing a scanned-in program closely resembles the source form in
- structure.
-
- "Prefix" languages make the fine structure more explicit, so they tend to
- be better suited when programs want to operate on other programs as data.
- In an embedded language like Emacs Lisp, this has the advantage that many
- of the interpreter's algorithms can double as primitives available to the
- user, and also that much of the interpreter can be written in its own
- language rather than the (usually less concise) implementation language.
- Since we like our embedded languages to be small and unobtrusive, this is
- a big win. Also, since embedded languages have to cope with changing
- environments, the ability to operate on code is useful in its own right.
-
- If simplicity of the interpreter isn't so great a concern, a design may
- opt for a more complex language that more closely matches the concepts
- human programmers like to work with. This produces languages which tend
- to have a variety of infix, prefix, and postfix operators, plus various
- syntactic elements like commas, semicolons, braces, etc. When we refer to
- an extension language being "infix," I think we're usually thinking about
- these sorts of languages.
-
- The problem with "infix" languages is that because of their mixture of
- human-style implicit assumptions and redundant syntax, they tend to be
- unsuitable for direct use as intermediate languages, while also making the
- need for an intermediate language more important because the language is
- less tied to the constraints that make an implementation run fast. So
- except for the simplest BASIC interpreters, most "infix" languages compile
- to an intermediate form that's closer to a prefix or postfix language in
- nature, even in so-called "interpretive" systems. Stavros brings up Lisp
- M-expressions as an example, although I can't help noticing that the early
- Lispers gave up on M-expressions because S-expressions (that is, prefix
- notation) turned out to be easy enough to use directly.
-
- So infix systems generally parse the input into an intermediate form that
- is fairly far removed from the source language that humans work with. As
- long as only the compiler or interpreter ever see the intermediate form,
- this doesn't really matter. But in a language like Lisp where the user
- works with the intermediate form directly, there is a lot to be gained
- from having the intermediate form "look" just like the source form. It's
- a lot less confusing, as well as making the parser a whole lot simpler.
- Both Stavros and Peter point out that a more complicated parser doesn't
- necessarily have to be much slower than a simple parser, but it *does*
- tend to be a lot larger, and compactness of the interpreter is important
- for embedded languages.
-
- Stavros continues:
- > 1) You seem to assume that postfix syntax translates into difficulty of
- > accessing the operator. But the syntax doesn't tell you anything about
- > the internal representation.
-
- But my assumption was based on the desire to have the internal
- representation be close to the syntax. When I said postfix was slightly
- slower for Lisp-like languages, I meant that, given a direct mapping of
- syntactic order onto internal Lisp lists, postfix has a slight extra cost
- because the argument list must be scanned once to reach the operator
- before you can know how to evaluate the arguments. It was only meant to
- be a minor point, though. A system that stored argument vectors instead
- of lists, for example, would not have this problem.
-
- I'm pretty confident that prefix is more natural than postfix, all else
- being the same, as least to English readers. Because a Lisp operator can
- affect the way the arguments are interpreted, people reading left to right
- want to see the operator first. That seems like a much more compelling
- argument against postfix, unless you're going to go the whole postfix
- route and have no parentheses to delimit the arguments.
-
- > 2) Many machine-oriented languages are postfix, precisely because they are
- > EASIER to interpret. The problem comes only with non-strict operators
- > like "if", which such languages avoid by using explicit gotos.
-
- Or by quoting, as with PostScript's "/" and "{ }" notations. Postfix is
- indeed great if its utter minimalism is enough for you, but if you want to
- write large programs, or do anything other than simply interpret a stored
- program, it gets clumsy. (Even PostScript was designed to be written
- mainly by other programs, where the human-friendly structure goes in the C
- program that generates the PostScript rather than in the PostScript
- itself.)
-
- > I've also seen languages which are Lisp-like in essence, but with C-like
- > parsers added on top to make them more palatable. In other words, they
- > have a parser which reads "a+b" into the list "(+ a b)".
-
- I've even used this technique myself. But it does complicate the parser,
- and I can understand that such creature comforts might not seem worthwhile
- in systems (like Emacs) which don't envision major applications being
- written in the extension language (my own crazed efforts notwithstanding
- ;-).
-
- > Yes, there are Lisp-like languages which provide an infix syntax, or even
- > Lisps which happen to have an infix reader as well. But they are not
- > Lisp-like because they represent the program as a tree! After all, many
- > language processors represent the program as a tree at some point....
-
- Right, they are Lisp-like because they represent the program as a *data
- structure of the language* which is a tree. A true Lisp fanatic might
- argue that your last point merely shows that other languages must convert
- themselves to Lisp in order to get anything done!
-
- -- Dave
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-