home *** CD-ROM | disk | FTP | other *** search
-
- <HTML>
- <HEAD>
- <TITLE>Parse::Yapp - Perl extension for generating and using LALR parsers.</TITLE>
- <LINK REL="stylesheet" HREF="../../../Active.css" TYPE="text/css">
- <LINK REV="made" HREF="mailto:">
- </HEAD>
-
- <BODY>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> Parse::Yapp - Perl extension for generating and using LALR parsers.</P></STRONG>
- </TD></TR>
- </TABLE>
-
- <A NAME="__index__"></A>
- <!-- INDEX BEGIN -->
-
- <UL>
-
- <LI><A HREF="#name">NAME</A></LI><LI><A HREF="#supportedplatforms">SUPPORTED PLATFORMS</A></LI>
-
- <LI><A HREF="#synopsis">SYNOPSIS</A></LI>
- <LI><A HREF="#description">DESCRIPTION</A></LI>
- <UL>
-
- <LI><A HREF="#the grammar file">The Grammar file</A></LI>
- </UL>
-
- <LI><A HREF="#bugs and suggestions">BUGS AND SUGGESTIONS</A></LI>
- <LI><A HREF="#author">AUTHOR</A></LI>
- <LI><A HREF="#see also">SEE ALSO</A></LI>
- <LI><A HREF="#copyright">COPYRIGHT</A></LI>
- </UL>
- <!-- INDEX END -->
-
- <HR>
- <P>
- <H1><A NAME="name">NAME</A></H1>
- <P>Parse::Yapp - Perl extension for generating and using LALR parsers.</P>
- <P>
- <HR>
- <H1><A NAME="supportedplatforms">SUPPORTED PLATFORMS</A></H1>
- <UL>
- <LI>Linux</LI>
- <LI>Solaris</LI>
- <LI>Windows</LI>
- </UL>
- <HR>
- <H1><A NAME="synopsis">SYNOPSIS</A></H1>
- <PRE>
- yapp -m MyParser grammar_file.yp</PRE>
- <PRE>
- ...</PRE>
- <PRE>
- use MyParser;</PRE>
- <PRE>
- $parser=new MyParser();
- $value=$parser->YYParse(yylex => \&lexer_sub, yyerror => \&error_sub);</PRE>
- <PRE>
- $nberr=$parser->YYNberr();</PRE>
- <PRE>
- $parser->YYData->{DATA}= [ 'Anything', 'You Want' ];</PRE>
- <PRE>
- $data=$parser->YYData->{DATA}[0];</PRE>
- <P>
- <HR>
- <H1><A NAME="description">DESCRIPTION</A></H1>
- <P>Parse::Yapp (Yet Another Perl Parser compiler) is a collection of modules
- that let you generate and use yacc like thread safe (reentrant) parsers with
- perl object oriented interface.</P>
- <P>The script yapp is a front-end to the Parse::Yapp module and let you
- easily create a Perl OO parser from an input grammar file.</P>
- <P>
- <H2><A NAME="the grammar file">The Grammar file</A></H2>
- <DL>
- <DT><STRONG><A NAME="item_Comments"><CODE>Comments</CODE></A></STRONG><BR>
- <DD>
- Through all your files, comments are either Perl style, introduced by <EM>#</EM>
- up to the end of line, or C style, enclosed between <EM>/*</EM> and <EM>*/</EM>.
- <P></P>
- <DT><STRONG><A NAME="item_Tokens_and_string_literals"><CODE>Tokens and string literals</CODE></A></STRONG><BR>
- <DD>
- Through all the grammar files, two kind of symbols may appear:
- <EM>Non-terminals</EM> symbols, also called <EM>left-hand-side</EM> symbols,
- which are the names of your rules, and <EM>Terminal</EM> symbols, also
- called <EM>Tokens</EM>.
- <P>Tokens are the symbols your lexer function will pass to your parser
- (see below). They come in two flavours: symbolic tokens and string
- literals.</P>
- <P>Non-terminals and symbolic tokens share the same identifier syntax:</P>
- <PRE>
- [A-Za-z][A-Za-z0-9_]*</PRE>
- <P>String literals are enclosed in single quotes and can contain almost
- anything. They will be output to your parser file double-quoted, making
- any special character be as is. '``', '$' and '@' will be automatically
- quoted with '\', making their writing more natural. On the other hand,
- if you need a single quote inside your literal, just quote it with '\'.</P>
- <P>You cannot have a literal <EM>'error'</EM> in your grammar as it would
- confuse the driver with the <EM>error</EM> token. Use a symbolic token instead.
- Using it anyway will produce a warning telling you you should have wrote
- it <EM>error</EM> and will treat it as if it were the <EM>error</EM> token.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Grammar_file_syntax"><CODE>Grammar file syntax</CODE></A></STRONG><BR>
- <DD>
- It is very close to yacc's one (in fact, <EM>Parse::Yapp</EM> should compile
- a clean <EM>yacc</EM> grammar without any modification, whereas the opposit
- is no true).
- <P>It is divided in three sections separated by <CODE>%%</CODE>:</P>
- <PRE>
- header section
- %%
- rules section
- %%
- footer section</PRE>
- <DL>
- <DT><STRONG><A NAME="item_The_Header_Section_section_may_contain%3A"><STRONG>The Header Section</STRONG> section may contain:</A></STRONG><BR>
- <DD>
- <LI>
- One ore more code blocks enclosed inside <CODE>%{</CODE> and <CODE>%}</CODE> just like in
- yacc. They may contain any valid Perl code and will be copied verbatim
- at the very beginning of the parser module. They are not as useful as
- they are in yacc, but you may use them, for example, for global variables
- declaration, though you will see later that such global variables can
- avoided to make reentrant parser modules.
- <P></P>
- <LI>
- Precedence declarations, introduced by <CODE>%left</CODE>, <CODE>%right</CODE> and <CODE>%nonassoc</CODE>
- specifying associativity, followed by the list of tokens or litterals
- having the same precedence and associativity.
- The precedence beeing the later declared have the highest level.
- (see the yacc or bison manuals for a full explanation of how they work,
- as they are implemented exactly the same way in Parse::Yapp)
- <P></P>
- <LI>
- <CODE>%start</CODE> followed by a rule's left hand side, declaring this rule to
- be the starting rule of your grammar. The default if <CODE>%start</CODE> is not
- declared is the first rule in your grammar section.
- <P></P>
- <LI>
- <CODE>%token</CODE> followed by a list of symbols, forcing them to be recognized
- as tokens, generating a syntax error if used in the left hand side of
- a rule declaration.
- Note that in Parse::Yapp, you <EM>don't</EM> need to declare tokens as in yacc: any
- symbol not appearing as a left hand side of a rule is considered to be
- a token.
- Other yacc declarations or constructs such as <CODE>%type</CODE> and <CODE>%union</CODE> are
- parsed but (almost) ignored.
- <P></P>
- <LI>
- <CODE>%expect</CODE> followed by a number, suppress warnings about number of Shift/Reduce
- conflicts when both numbers match, a la bison.
- <P></P>
- <DT><STRONG><A NAME="item_The_Rule_Section_contains_your_grammar_rules%3A"><STRONG>The Rule Section</STRONG> contains your grammar rules:</A></STRONG><BR>
- <DD>
- A rule is made of a left-hand-side symbol, followed by a <CODE>':'</CODE> and one
- or more right hand sides separated by <CODE>'|'</CODE> and terminated by a <CODE>';'</CODE>:
- <PRE>
- exp: exp '+' exp
- | exp '-' exp
- ;</PRE>
- <P>A right hand side may be empty:</P>
- <PRE>
- input: #empty
- | input line
- ;</PRE>
- <P>(if you have more than one empty rhs, Parse::Yapp will issue a warning,
- as this is usually a mistake, and you sure will have a reduce/reduce
- conflict)</P>
- <P>A rhs may be followed by an optionnal <CODE>%prec</CODE> directive, followed
- by a token, giving the rule and explicit precedence (see yacc manuals
- for its precise meaning) and optionnal semantic action code block (see
- below).</P>
- <PRE>
- exp: '-' exp %prec NEG { -$_[1] }
- | exp '+' exp { $_[1] + $_[3] }
- | NUM
- ;</PRE>
- <P>Note that in Parse::Yapp, a lhs <EM>cannot</EM> appear more than once as
- a rule name (This differs from yacc).</P>
- <P></P>
- <DT><STRONG><A NAME="item_The_footer_section"><CODE>The footer section</CODE></A></STRONG><BR>
- <DD>
- may contain any valid Perl code and will be appended at the very end
- of your parser module. Here you can write your lexer, error report
- subs and anything relevant to you parser.
- <P></P>
- <DT><STRONG><A NAME="item_Semantic_actions"><CODE>Semantic actions</CODE></A></STRONG><BR>
- <DD>
- Semantic actions are run every time a <EM>reduction</EM> occurs in the
- parsing flow and they must return a semantic value.
- <P>They are (usually, but see below <A HREF="#item_In_rule_actions"><CODE>In rule actions</CODE></A>) written at
- the very end of the rhs, enclosed with <CODE>{ }</CODE>, and are copied verbatim
- to your parser file, inside of the rules table.</P>
- <P>Be aware that matching braces in Perl is much more difficult than
- in C: inside strings they don't need to match. While in C it is
- very easy to detect the beginning of a string construct, or a
- single character, it is much more difficult in Perl, as there
- are so many ways of writing such literals. So there is no check
- for that today. If you need a brace in a string, quote it (<CODE>\{</CODE> or
- <CODE>\}</CODE>) that should work. Or (weird) make a comment matching it. Sorry.</P>
- <PRE>
- {
- "{ My string block }".
- "\{ My other string block \}".
- qq/ My unmatched brace \} /.
- #Force the match: {
- q/ My last brace } /
- }</PRE>
- <P>All of these constructs should work.</P>
- <P>In Parse::Yapp, semantic actions are called like normal Perl sub calls,
- with their arguments passed in <CODE>@_</CODE>, and their semantic value are
- their return values.</P>
- <P>$_[1] to $_[n] are the parameters just as $1 to $n in yacc, while
- $_[0] is the parser object itself.</P>
- <P>Having $_[0] beeing the parser object itself allows you to call
- parser methods. Thats how the yacc macros are implemented:</P>
- <PRE>
- yyerrok is done by calling $_[0]->YYErrok
- YYERROR is done by calling $_[0]->YYError
- YYACCEPT is done by calling $_[0]->YYAccept
- YYABORT is done by calling $_[0]->YYAbort</PRE>
- <P>All those methods explicitly return <EM>undef</EM>, for convenience.</P>
- <PRE>
- YYRECOVERING is done by calling $_[0]->YYRecovering</PRE>
- <P>Three useful methods in error recovery sub</P>
- <PRE>
- $_[0]->YYCurtok
- $_[0]->YYCurval
- $_[0]->YYExpect</PRE>
- <P>return respectivly the current input token that made the parse fail,
- its semantic value (both can be used to modify their values too, but
- know what you do !) and a list which contains the tokens the parser
- expected when the failure occured.</P>
- <P>Note that if <CODE>$_[0]->YYCurtok</CODE> is declared as a <CODE>%nonassoc</CODE> token,
- it can be included in <CODE>$_[0]->YYExpect</CODE> list whenever the input
- try to use it in an associative way. This is not a bug: the token
- IS expected to report an error if encountered.</P>
- <P>To detect such a thing in your error reporting sub, the following
- example should do the trick:</P>
- <PRE>
- grep { $_[0]->YYCurtok eq $_ } $_[0]->YYExpect
- and do {
- #Non-associative token used in an associative expression
- };</PRE>
- <P>Accessing semantics values on the left of your reducing rule is done
- through the method</P>
- <PRE>
- $_[0]->YYSemval( index )</PRE>
- <P>where index is an integer. Its value beeing <EM>1 .. n</EM> returns the same values
- than <EM>$_[1] .. $_[n]</EM>, but <EM>-n .. 0</EM> returns values on the left of the rule
- beeing reduced (It is related to <EM>$-n .. $0 .. $n</EM> in yacc, but you
- cannot use <EM>$_[0]</EM> or <EM>$_[-n]</EM> constructs in Parse::Yapp for obvious reasons)</P>
- <P>There is also a provision for user data area in the parser object,
- accessed by the method:</P>
- <PRE>
- $_[0]->YYData</PRE>
- <P>which returns a reference to an anonymous hash, letting you have
- all of your parsing data held inside the object (see the Calc.yp
- or ParseYapp.yp files in the distribution for some examples).
- That's how you can make you parser module reentrant: all of your
- module states and variables are held inside the parser object.</P>
- <P>Note: unfortunatly, method calls in Perl have a lot of overhead,
- and when YYData is used, it may be called a huge number
- of times. If your are not a *real* purist and efficiency
- is your concern, you may access directly the user-space
- in the object: $parser->{USER} wich is a reference to an
- anonymous hash array, and then benchmark.</P>
- <P>If no action is specified for a rule, the equivalant of a default
- action is run, which returns the first parameter:</P>
- <PRE>
- { $_[1] }</PRE>
- <P></P>
- <DT><STRONG><A NAME="item_In_rule_actions"><CODE>In rule actions</CODE></A></STRONG><BR>
- <DD>
- It is also possible to embbed semantic actions inside of a rule:
- <PRE>
- typedef: TYPE { $type = $_[1] } identlist { ... } ;</PRE>
- <P>When the Parse::Yapp's parser encounter such an embeded action, it modifies
- the grammar as if you wrote (although @x-1 is not a legal lhs value):</P>
- <PRE>
- @x-1: /* empty */ { $type = $_[1] };
- typedef: TYPE @x-1 identlist { ... } ;</PRE>
- <P>where <EM>x</EM> is a sequential number incremented for each ``in rule'' action,
- and <EM>-1</EM> represents the ``dot position'' in the rule where the action arises.</P>
- <P>In such actions, you can use <EM>$_[1]..$_[n]</EM> variables, which are the
- semantic values on the left of your action.</P>
- <P>Be aware that the way Parse::Yapp modifies your grammar because of
- <EM>in rule actions</EM> can produce, in some cases, spurious conflicts
- that wouldn't happen otherwise.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Generating_the_Parser_Module"><CODE>Generating the Parser Module</CODE></A></STRONG><BR>
- <DD>
- Now that you grammar file is written, you can use yapp on it
- to generate your parser module:
- <PRE>
- yapp -v Calc.yp</PRE>
- <P>will create two files <EM>Calc.pm</EM>, your parser module, and <EM>Calc.output</EM>
- a verbose output of your parser rules, conflicts, warnings, states
- and summary.</P>
- <P>What your are missing now is a lexer routine.</P>
- <P></P>
- <DT><STRONG><A NAME="item_The_Lexer_sub"><CODE>The Lexer sub</CODE></A></STRONG><BR>
- <DD>
- is called each time the parser need to read the next token.
- <P>It is called with only one argument that is the parser object itself,
- so you can access its methods, specially the</P>
- <PRE>
- $_[0]->YYData</PRE>
- <P>data area.</P>
- <P>It is its duty to return the next token and value to the parser.
- They <CODE>must</CODE> be returned as a list of two variables, the first one
- beeing the token known by the parser (symbolic or literal), and the
- second one beeing anything you want (usualy the text of the next
- token, or the literal value) from a simple scalar value to any
- complex reference, as the parsing driver never use it but to call
- semantic actions:</P>
- <PRE>
- ( 'NUMBER', $num )
- or
- ( '>=', '>=' )
- or
- ( 'ARRAY', [ @values ] )</PRE>
- <P>When the lexer reach the end of input, it must return the <CODE>''</CODE>
- empty token with an undef value:</P>
- <PRE>
- ( '', undef )</PRE>
- <P>Note that your lexer should <EM>never</EM> return <CODE>'error'</CODE> as token
- value: for the driver, this is the error token used for error
- recovery and would lead to odd reactions.</P>
- <P>You now have your lexer written, maybe you will need to output
- meaningful error messages, instead of the default which is to
- print 'Parse error.' on STDERR.</P>
- <P>So you will need an Error reporting sub.</P>
- <P>item <CODE>Error reporting routine</CODE></P>
- <P>If you want one, write it knowing that it is passed as parameter
- the parser object. So you can share information whith the lexer
- routine quite easily.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Parsing"><CODE>Parsing</CODE></A></STRONG><BR>
- <DD>
- Now you've got everything to do the parsing.
- <P>First, use the parser module:</P>
- <PRE>
- use Calc;</PRE>
- <P>Then create the parser object:</P>
- <PRE>
- $parser=new Calc;</PRE>
- <P>Now, call the YYParse method, telling it where to find the lexer
- and error report subs:</P>
- <PRE>
- $result=$parser->YYParse(yylex => \&Lexer,
- yyerror => \&ErrorReport);</PRE>
- <P>(assuming Lexer and ErrorReport subs have been written in your current
- package)</P>
- <P>The order in which parameters appear is unimportant.</P>
- <P>Et voila.</P>
- <P>The YYParse method will do the parse, then return the last semantic
- value returned, or undef if error recovery cannot recover.</P>
- <P>If you need to be sure the parse has been successful (in case your
- last returned semantic value <EM>is</EM> undef) make a call to:</P>
- <PRE>
- $parser->YYNberr()</PRE>
- <P>which returns the total number of time the error reporting sub has been called.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Error_Recovery"><CODE>Error Recovery</CODE></A></STRONG><BR>
- <DD>
- in Parse::Yapp is implemented the same way it is in yacc.
- <P></P>
- <DT><STRONG><A NAME="item_Debugging_Parser"><CODE>Debugging Parser</CODE></A></STRONG><BR>
- <DD>
- To debug your parser, you can call the YYParse method with a debug parameter:
- <PRE>
- $parser->YYParse( ... , yydebug => value, ... )</PRE>
- <P>where value is a bitfield, each bit representing a specific debug output:</P>
- <PRE>
- Bit Value Outputs
- 0x01 Token reading (useful for Lexer debugging)
- 0x02 States information
- 0x04 Driver actions (shifts, reduces, accept...)
- 0x08 Parse Stack dump
- 0x10 Error Recovery tracing</PRE>
- <P>To have a full debugging ouput, use</P>
- <PRE>
- debug => 0x1F</PRE>
- <P>Debugging output is sent to STDERR, and be aware that it can produce
- <CODE>huge</CODE> outputs.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Standalone_Parsers"><CODE>Standalone Parsers</CODE></A></STRONG><BR>
- <DD>
- By default, the parser modules generated will need the Parse::Yapp
- module installed on the system to run. They use the Parse::Yapp::Driver
- which can be safely shared between parsers in the same script.
- <P>In the case you'd prefer to have a standalone module generated, use
- the <CODE>-s</CODE> switch with yapp: this will automagically copy the driver
- code into your module so you can use/distribute it without the need
- of the Parse::Yapp module, making it really a <CODE>Standalone Parser</CODE>.</P>
- <P>If you do so, please remember to include Parse::Yapp's copyright notice
- in your main module copyright, so others can know about Parse::Yapp module.</P>
- <P></P>
- <DT><STRONG><A NAME="item_Source_file_line_numbers"><CODE>Source file line numbers</CODE></A></STRONG><BR>
- <DD>
- by default will be included in the generated parser module, which will help
- to find the guilty line in your source file in case of a syntax error.
- You can disable this feature by compiling your grammar with yapp using
- the <CODE>-n</CODE> switch.
- <P></P></DL>
- </DL>
- <P>
- <HR>
- <H1><A NAME="bugs and suggestions">BUGS AND SUGGESTIONS</A></H1>
- <P>If you find any bug, think of anything that could improve Parse::Yapp
- or have any questions related to it, feel free to contact the author.</P>
- <P>
- <HR>
- <H1><A NAME="author">AUTHOR</A></H1>
- <P>Francois Desarmenien <A HREF="mailto:desar@club-internet.fr">desar@club-internet.fr</A></P>
- <P>
- <HR>
- <H1><A NAME="see also">SEE ALSO</A></H1>
- <P><CODE>yapp(1)</CODE> <CODE>perl(1)</CODE> <CODE>yacc(1)</CODE> bison(1).</P>
- <P>
- <HR>
- <H1><A NAME="copyright">COPYRIGHT</A></H1>
- <P>The Parse::Yapp module and its related modules and shell scripts are copyright
- (c) 1998-1999 Francois Desarmenien, France. All rights reserved.</P>
- <P>You may use and distribute them under the terms of either
- the GNU General Public License or the Artistic License,
- as specified in the Perl README file.</P>
- <P>If you use the ``standalone parser'' option so people don't need to install
- Parse::Yapp on their systems in order to run you software, this copyright
- noticed should be included in your software copyright too, and the copyright
- notice in the embedded driver should be left untouched.</P>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> Parse::Yapp - Perl extension for generating and using LALR parsers.</P></STRONG>
- </TD></TR>
- </TABLE>
-
- </BODY>
-
- </HTML>
-