home *** CD-ROM | disk | FTP | other *** search
- <HTML>
- <HEAD>
- <TITLE>perlcompile - Introduction to the Perl Compiler-Translator</TITLE>
- <LINK REL="stylesheet" HREF="../../Active.css" TYPE="text/css">
- <LINK REV="made" HREF="mailto:">
- </HEAD>
-
- <BODY>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> perlcompile - Introduction to the Perl Compiler-Translator</P></STRONG>
- </TD></TR>
- </TABLE>
-
- <A NAME="__index__"></A>
- <!-- INDEX BEGIN -->
-
- <UL>
-
- <LI><A HREF="#name">NAME</A></LI>
- <LI><A HREF="#description">DESCRIPTION</A></LI>
- <UL>
-
- <LI><A HREF="#layout">Layout</A></LI>
- </UL>
-
- <LI><A HREF="#using the back ends">Using The Back Ends</A></LI>
- <UL>
-
- <LI><A HREF="#the cross referencing back end">The Cross Referencing Back End</A></LI>
- <LI><A HREF="#the decompiling back end">The Decompiling Back End</A></LI>
- <LI><A HREF="#the lint back end">The Lint Back End</A></LI>
- <LI><A HREF="#the simple c back end">The Simple C Back End</A></LI>
- <LI><A HREF="#the bytecode back end">The Bytecode Back End</A></LI>
- <LI><A HREF="#the optimized c back end">The Optimized C Back End</A></LI>
- </UL>
-
- <LI><A HREF="#known problems">KNOWN PROBLEMS</A></LI>
- <LI><A HREF="#author">AUTHOR</A></LI>
- </UL>
- <!-- INDEX END -->
-
- <HR>
- <P>
- <H1><A NAME="name">NAME</A></H1>
- <P>perlcompile - Introduction to the Perl Compiler-Translator</P>
- <P>
- <HR>
- <H1><A NAME="description">DESCRIPTION</A></H1>
- <P>Perl has always had a compiler: your source is compiled into an
- internal form (a parse tree) which is then optimized before being
- run. Since version 5.005, Perl has shipped with a module
- capable of inspecting the optimized parse tree (<A HREF="#item_B"><CODE>B</CODE></A>), and this has
- been used to write many useful utilities, including a module that lets
- you turn your Perl into C source code that can be compiled into an
- native executable.</P>
- <P>The <A HREF="#item_B"><CODE>B</CODE></A> module provides access to the parse tree, and other modules
- (``back ends'') do things with the tree. Some write it out as
- bytecode, C source code, or a semi-human-readable text. Another
- traverses the parse tree to build a cross-reference of which
- subroutines, formats, and variables are used where. Another checks
- your code for dubious constructs. Yet another back end dumps the
- parse tree back out as Perl source, acting as a source code beautifier
- or deobfuscator.</P>
- <P>Because its original purpose was to be a way to produce C code
- corresponding to a Perl program, and in turn a native executable, the
- <A HREF="#item_B"><CODE>B</CODE></A> module and its associated back ends are known as ``the
- compiler'', even though they don't really compile anything.
- Different parts of the compiler are more accurately a ``translator'',
- or an ``inspector'', but people want Perl to have a ``compiler
- option'' not an ``inspector gadget''. What can you do?</P>
- <P>This document covers the use of the Perl compiler: which modules
- it comprises, how to use the most important of the back end modules,
- what problems there are, and how to work around them.</P>
- <P>
- <H2><A NAME="layout">Layout</A></H2>
- <P>The compiler back ends are in the <CODE>B::</CODE> hierarchy, and the front-end
- (the module that you, the user of the compiler, will sometimes
- interact with) is the O module. Some back ends (e.g., <A HREF="#item_B%3A%3AC"><CODE>B::C</CODE></A>) have
- programs (e.g., <EM>perlcc</EM>) to hide the modules' complexity.</P>
- <P>Here are the important back ends to know about, with their status
- expressed as a number from 0 (outline for later implementation) to
- 10 (if there's a bug in it, we're very surprised):</P>
- <DL>
- <DT><STRONG><A NAME="item_B%3A%3ABytecode">B::Bytecode</A></STRONG><BR>
- <DD>
- Stores the parse tree in a machine-independent format, suitable
- for later reloading through the ByteLoader module. Status: 5 (some
- things work, some things don't, some things are untested).
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AC">B::C</A></STRONG><BR>
- <DD>
- Creates a C source file containing code to rebuild the parse tree
- and resume the interpreter. Status: 6 (many things work adequately,
- including programs using Tk).
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ACC">B::CC</A></STRONG><BR>
- <DD>
- Creates a C source file corresponding to the run time code path in
- the parse tree. This is the closest to a Perl-to-C translator there
- is, but the code it generates is almost incomprehensible because it
- translates the parse tree into a giant switch structure that
- manipulates Perl structures. Eventual goal is to reduce (given
- sufficient type information in the Perl program) some of the
- Perl data structure manipulations into manipulations of C-level
- ints, floats, etc. Status: 5 (some things work, including
- uncomplicated Tk examples).
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ALint">B::Lint</A></STRONG><BR>
- <DD>
- Complains if it finds dubious constructs in your source code. Status:
- 6 (it works adequately, but only has a very limited number of areas
- that it checks).
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ADeparse">B::Deparse</A></STRONG><BR>
- <DD>
- Recreates the Perl source, making an attempt to format it coherently.
- Status: 8 (it works nicely, but a few obscure things are missing).
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AXref">B::Xref</A></STRONG><BR>
- <DD>
- Reports on the declaration and use of subroutines and variables.
- Status: 8 (it works nicely, but still has a few lingering bugs).
- <P></P></DL>
- <P>
- <HR>
- <H1><A NAME="using the back ends">Using The Back Ends</A></H1>
- <P>The following sections describe how to use the various compiler back
- ends. They're presented roughly in order of maturity, so that the
- most stable and proven back ends are described first, and the most
- experimental and incomplete back ends are described last.</P>
- <P>The O module automatically enabled the <STRONG>-c</STRONG> flag to Perl, which
- prevents Perl from executing your code once it has been compiled.
- This is why all the back ends print:</P>
- <PRE>
- myperlprogram syntax OK</PRE>
- <P>before producing any other output.</P>
- <P>
- <H2><A NAME="the cross referencing back end">The Cross Referencing Back End</A></H2>
- <P>The cross referencing back end (B::Xref) produces a report on your program,
- breaking down declarations and uses of subroutines and variables (and
- formats) by file and subroutine. For instance, here's part of the
- report from the <EM>pod2man</EM> program that comes with Perl:</P>
- <PRE>
- Subroutine clear_noremap
- Package (lexical)
- $ready_to_print i1069, 1079
- Package main
- $& 1086
- $. 1086
- $0 1086
- $1 1087
- $2 1085, 1085
- $3 1085, 1085
- $ARGV 1086
- %HTML_Escapes 1085, 1085</PRE>
- <P>This shows the variables used in the subroutine <CODE>clear_noremap</CODE>. The
- variable <CODE>$ready_to_print</CODE> is a <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my()</CODE></A> (lexical) variable,
- <STRONG>i</STRONG>ntroduced (first declared with <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my())</CODE></A> on line 1069, and used on
- line 1079. The variable <CODE>$&</CODE> from the main package is used on 1086,
- and so on.</P>
- <P>A line number may be prefixed by a single letter:</P>
- <DL>
- <DT><STRONG><A NAME="item_i">i</A></STRONG><BR>
- <DD>
- Lexical variable introduced (declared with <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my())</CODE></A> for the first time.
- <P></P>
- <DT><STRONG><A NAME="item_%26">&</A></STRONG><BR>
- <DD>
- Subroutine or method call.
- <P></P>
- <DT><STRONG><A NAME="item_s">s</A></STRONG><BR>
- <DD>
- Subroutine defined.
- <P></P>
- <DT><STRONG><A NAME="item_r">r</A></STRONG><BR>
- <DD>
- Format defined.
- <P></P></DL>
- <P>The most useful option the cross referencer has is to save the report
- to a separate file. For instance, to save the report on
- <EM>myperlprogram</EM> to the file <EM>report</EM>:</P>
- <PRE>
- $ perl -MO=Xref,-oreport myperlprogram</PRE>
- <P>
- <H2><A NAME="the decompiling back end">The Decompiling Back End</A></H2>
- <P>The Deparse back end turns your Perl source back into Perl source. It
- can reformat along the way, making it useful as a de-obfuscator. The
- most basic way to use it is:</P>
- <PRE>
- $ perl -MO=Deparse myperlprogram</PRE>
- <P>You'll notice immediately that Perl has no idea of how to paragraph
- your code. You'll have to separate chunks of code from each other
- with newlines by hand. However, watch what it will do with
- one-liners:</P>
- <PRE>
- $ perl -MO=Deparse -e '$op=shift||die "usage: $0
- code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op;
- die$@ if$@; rename$was,$_ unless$was eq $_}'
- -e syntax OK
- $op = shift @ARGV || die("usage: $0 code [...]");
- chomp(@ARGV = <ARGV>) unless @ARGV;
- foreach $_ (@ARGV) {
- $was = $_;
- eval $op;
- die $@ if $@;
- rename $was, $_ unless $was eq $_;
- }</PRE>
- <P>(this is the <EM>rename</EM> program that comes in the <EM>eg/</EM> directory
- of the Perl source distribution).</P>
- <P>The decompiler has several options for the code it generates. For
- instance, you can set the size of each indent from 4 (as above) to
- 2 with:</P>
- <PRE>
- $ perl -MO=Deparse,-si2 myperlprogram</PRE>
- <P>The <STRONG>-p</STRONG> option adds parentheses where normally they are omitted:</P>
- <PRE>
- $ perl -MO=Deparse -e 'print "Hello, world\n"'
- -e syntax OK
- print "Hello, world\n";
- $ perl -MO=Deparse,-p -e 'print "Hello, world\n"'
- -e syntax OK
- print("Hello, world\n");</PRE>
- <P>See <A HREF="../../lib/B/Deparse.html">the B::Deparse manpage</A> for more information on the formatting options.</P>
- <P>
- <H2><A NAME="the lint back end">The Lint Back End</A></H2>
- <P>The lint back end (B::Lint) inspects programs for poor style. One
- programmer's bad style is another programmer's useful tool, so options
- let you select what is complained about.</P>
- <P>To run the style checker across your source code:</P>
- <PRE>
- $ perl -MO=Lint myperlprogram</PRE>
- <P>To disable context checks and undefined subroutines:</P>
- <PRE>
- $ perl -MO=Lint,-context,-undefined-subs myperlprogram</PRE>
- <P>See <A HREF="../../lib/B/Lint.html">the B::Lint manpage</A> for information on the options.</P>
- <P>
- <H2><A NAME="the simple c back end">The Simple C Back End</A></H2>
- <P>This module saves the internal compiled state of your Perl program
- to a C source file, which can be turned into a native executable
- for that particular platform using a C compiler. The resulting
- program links against the Perl interpreter library, so it
- will not save you disk space (unless you build Perl with a shared
- library) or program size. It may, however, save you startup time.</P>
- <P>The <CODE>perlcc</CODE> tool generates such executables by default.</P>
- <PRE>
- perlcc myperlprogram.pl</PRE>
- <P>
- <H2><A NAME="the bytecode back end">The Bytecode Back End</A></H2>
- <P>This back end is only useful if you also have a way to load and
- execute the bytecode that it produces. The ByteLoader module provides
- this functionality.</P>
- <P>To turn a Perl program into executable byte code, you can use <CODE>perlcc</CODE>
- with the <CODE>-b</CODE> switch:</P>
- <PRE>
- perlcc -b myperlprogram.pl</PRE>
- <P>The byte code is machine independent, so once you have a compiled
- module or program, it is as portable as Perl source (assuming that
- the user of the module or program has a modern-enough Perl interpreter
- to decode the byte code).</P>
- <P>See <STRONG>B::Bytecode</STRONG> for information on options to control the
- optimization and nature of the code generated by the Bytecode module.</P>
- <P>
- <H2><A NAME="the optimized c back end">The Optimized C Back End</A></H2>
- <P>The optimized C back end will turn your Perl program's run time
- code-path into an equivalent (but optimized) C program that manipulates
- the Perl data structures directly. The program will still link against
- the Perl interpreter library, to allow for eval(), <CODE>s///e</CODE>,
- <A HREF="../../lib/Pod/perlfunc.html#item_require"><CODE>require</CODE></A>, etc.</P>
- <P>The <CODE>perlcc</CODE> tool generates such executables when using the -opt
- switch. To compile a Perl program (ending in <CODE>.pl</CODE>
- or <CODE>.p</CODE>):</P>
- <PRE>
- perlcc -opt myperlprogram.pl</PRE>
- <P>To produce a shared library from a Perl module (ending in <CODE>.pm</CODE>):</P>
- <PRE>
- perlcc -opt Myperlmodule.pm</PRE>
- <P>For more information, see <EM>perlcc</EM> and <A HREF="../../lib/B/CC.html">the B::CC manpage</A>.</P>
- <DL>
- <DT><STRONG><A NAME="item_B">B</A></STRONG><BR>
- <DD>
- This module is the introspective (``reflective'' in Java terms)
- module, which allows a Perl program to inspect its innards. The
- back end modules all use this module to gain access to the compiled
- parse tree. You, the user of a back end module, will not need to
- interact with B.
- <P></P>
- <DT><STRONG><A NAME="item_O">O</A></STRONG><BR>
- <DD>
- This module is the front-end to the compiler's back ends. Normally
- called something like this:
- <PRE>
- $ perl -MO=Deparse myperlprogram</PRE>
- <P>This is like saying <CODE>use O 'Deparse'</CODE> in your Perl program.</P>
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AAsmdata">B::Asmdata</A></STRONG><BR>
- <DD>
- This module is used by the B::Assembler module, which is in turn used
- by the B::Bytecode module, which stores a parse-tree as
- bytecode for later loading. It's not a back end itself, but rather a
- component of a back end.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AAssembler">B::Assembler</A></STRONG><BR>
- <DD>
- This module turns a parse-tree into data suitable for storing
- and later decoding back into a parse-tree. It's not a back end
- itself, but rather a component of a back end. It's used by the
- <EM>assemble</EM> program that produces bytecode.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ABblock">B::Bblock</A></STRONG><BR>
- <DD>
- This module is used by the B::CC back end. It walks ``basic blocks''.
- A basic block is a series of operations which is known to execute from
- start to finish, with no possiblity of branching or halting.
- <P></P>
- <DT><STRONG>B::Bytecode</STRONG><BR>
- <DD>
- This module is a back end that generates bytecode from a
- program's parse tree. This bytecode is written to a file, from where
- it can later be reconstructed back into a parse tree. The goal is to
- do the expensive program compilation once, save the interpreter's
- state into a file, and then restore the state from the file when the
- program is to be executed. See <A HREF="#the bytecode back end">The Bytecode Back End</A>
- for details about usage.
- <P></P>
- <DT><STRONG>B::C</STRONG><BR>
- <DD>
- This module writes out C code corresponding to the parse tree and
- other interpreter internal structures. You compile the corresponding
- C file, and get an executable file that will restore the internal
- structures and the Perl interpreter will begin running the
- program. See <A HREF="#the simple c back end">The Simple C Back End</A> for details about usage.
- <P></P>
- <DT><STRONG>B::CC</STRONG><BR>
- <DD>
- This module writes out C code corresponding to your program's
- operations. Unlike the B::C module, which merely stores the
- interpreter and its state in a C program, the B::CC module makes a
- C program that does not involve the interpreter. As a consequence,
- programs translated into C by B::CC can execute faster than normal
- interpreted programs. See <A HREF="#the optimized c back end">The Optimized C Back End</A> for
- details about usage.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ADebug">B::Debug</A></STRONG><BR>
- <DD>
- This module dumps the Perl parse tree in verbose detail to STDOUT.
- It's useful for people who are writing their own back end, or who
- are learning about the Perl internals. It's not useful to the
- average programmer.
- <P></P>
- <DT><STRONG>B::Deparse</STRONG><BR>
- <DD>
- This module produces Perl source code from the compiled parse tree.
- It is useful in debugging and deconstructing other people's code,
- also as a pretty-printer for your own source. See
- <A HREF="#the decompiling back end">The Decompiling Back End</A> for details about usage.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ADisassembler">B::Disassembler</A></STRONG><BR>
- <DD>
- This module turns bytecode back into a parse tree. It's not a back
- end itself, but rather a component of a back end. It's used by the
- <EM>disassemble</EM> program that comes with the bytecode.
- <P></P>
- <DT><STRONG>B::Lint</STRONG><BR>
- <DD>
- This module inspects the compiled form of your source code for things
- which, while some people frown on them, aren't necessarily bad enough
- to justify a warning. For instance, use of an array in scalar context
- without explicitly saying <A HREF="../../lib/Pod/perlfunc.html#item_scalar"><CODE>scalar(@array)</CODE></A> is something that Lint
- can identify. See <A HREF="#the lint back end">The Lint Back End</A> for details about usage.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AShowlex">B::Showlex</A></STRONG><BR>
- <DD>
- This module prints out the <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my()</CODE></A> variables used in a function or a
- file. To gt a list of the <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my()</CODE></A> variables used in the subroutine
- <CODE>mysub()</CODE> defined in the file myperlprogram:
- <PRE>
- $ perl -MO=Showlex,mysub myperlprogram</PRE>
- <P>To gt a list of the <A HREF="../../lib/Pod/perlfunc.html#item_my"><CODE>my()</CODE></A> variables used in the file myperlprogram:</P>
- <PRE>
- $ perl -MO=Showlex myperlprogram</PRE>
- <P>[BROKEN]</P>
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AStackobj">B::Stackobj</A></STRONG><BR>
- <DD>
- This module is used by the B::CC module. It's not a back end itself,
- but rather a component of a back end.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3AStash">B::Stash</A></STRONG><BR>
- <DD>
- This module is used by the <EM>perlcc</EM> program, which compiles a module
- into an executable. B::Stash prints the symbol tables in use by a
- program, and is used to prevent B::CC from producing C code for the
- B::* and O modules. It's not a back end itself, but rather a
- component of a back end.
- <P></P>
- <DT><STRONG><A NAME="item_B%3A%3ATerse">B::Terse</A></STRONG><BR>
- <DD>
- This module prints the contents of the parse tree, but without as much
- information as B::Debug. For comparison, <CODE>print "Hello, world."</CODE>
- produced 96 lines of output from B::Debug, but only 6 from B::Terse.
- <P>This module is useful for people who are writing their own back end,
- or who are learning about the Perl internals. It's not useful to the
- average programmer.</P>
- <P></P>
- <DT><STRONG>B::Xref</STRONG><BR>
- <DD>
- This module prints a report on where the variables, subroutines, and
- formats are defined and used within a program and the modules it
- loads. See <A HREF="#the cross referencing back end">The Cross Referencing Back End</A> for details about
- usage.
- <P></P></DL>
- <P>
- <HR>
- <H1><A NAME="known problems">KNOWN PROBLEMS</A></H1>
- <P>The simple C backend currently only saves typeglobs with alphanumeric
- names.</P>
- <P>The optimized C backend outputs code for more modules than it should
- (e.g., DirHandle). It also has little hope of properly handling
- <A HREF="../../lib/Pod/perlfunc.html#item_goto"><CODE>goto LABEL</CODE></A> outside the running subroutine (<CODE>goto &sub</CODE> is ok).
- <A HREF="../../lib/Pod/perlfunc.html#item_goto"><CODE>goto LABEL</CODE></A> currently does not work at all in this backend.
- It also creates a huge initialization function that gives
- C compilers headaches. Splitting the initialization function gives
- better results. Other problems include: unsigned math does not
- work correctly; some opcodes are handled incorrectly by default
- opcode handling mechanism.</P>
- <P>BEGIN{} blocks are executed while compiling your code. Any external
- state that is initialized in BEGIN{}, such as opening files, initiating
- database connections etc., do not behave properly. To work around
- this, Perl has an INIT{} block that corresponds to code being executed
- before your program begins running but after your program has finished
- being compiled. Execution order: BEGIN{}, (possible save of state
- through compiler back-end), INIT{}, program runs, END{}.</P>
- <P>
- <HR>
- <H1><A NAME="author">AUTHOR</A></H1>
- <P>This document was originally written by Nathan Torkington, and is now
- maintained by the perl5-porters mailing list
- <EM><A HREF="mailto:perl5-porters@perl.org">perl5-porters@perl.org</A></EM>.</P>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> perlcompile - Introduction to the Perl Compiler-Translator</P></STRONG>
- </TD></TR>
- </TABLE>
-
- </BODY>
-
- </HTML>
-