home *** CD-ROM | disk | FTP | other *** search
Text File | 2002-06-19 | 37.1 KB | 1,018 lines |
- =head1 NAME
-
- perldebguts - Guts of Perl debugging
-
- =head1 DESCRIPTION
-
- This is not the perldebug(1) manpage, which tells you how to use
- the debugger. This manpage describes low-level details concerning
- the debugger's internals, which range from difficult to impossible
- to understand for anyone who isn't incredibly intimate with Perl's guts.
- Caveat lector.
-
- =head1 Debugger Internals
-
- Perl has special debugging hooks at compile-time and run-time used
- to create debugging environments. These hooks are not to be confused
- with the I<perl -Dxxx> command described in L<perlrun>, which is
- usable only if a special Perl is built per the instructions in the
- F<INSTALL> podpage in the Perl source tree.
-
- For example, whenever you call Perl's built-in C<caller> function
- from the package C<DB>, the arguments that the corresponding stack
- frame was called with are copied to the C<@DB::args> array. These
- mechanisms are enabled by calling Perl with the B<-d> switch.
- Specifically, the following additional features are enabled
- (cf. L<perlvar/$^P>):
-
- =over 4
-
- =item *
-
- Perl inserts the contents of C<$ENV{PERL5DB}> (or C<BEGIN {require
- 'perl5db.pl'}> if not present) before the first line of your program.
-
- =item *
-
- Each array C<@{"_<$filename"}> holds the lines of $filename for a
- file compiled by Perl. The same is also true for C<eval>ed strings
- that contain subroutines, or which are currently being executed.
- The $filename for C<eval>ed strings looks like C<(eval 34)>.
- Code assertions in regexes look like C<(re_eval 19)>.
-
- Values in this array are magical in numeric context: they compare
- equal to zero only if the line is not breakable.
-
- =item *
-
- Each hash C<%{"_<$filename"}> contains breakpoints and actions keyed
- by line number. Individual entries (as opposed to the whole hash)
- are settable. Perl only cares about Boolean true here, although
- the values used by F<perl5db.pl> have the form
- C<"$break_condition\0$action">.
-
- The same holds for evaluated strings that contain subroutines, or
- which are currently being executed. The $filename for C<eval>ed strings
- looks like C<(eval 34)> or C<(re_eval 19)>.
-
- =item *
-
- Each scalar C<${"_<$filename"}> contains C<"_<$filename">. This is
- also the case for evaluated strings that contain subroutines, or
- which are currently being executed. The $filename for C<eval>ed
- strings looks like C<(eval 34)> or C<(re_eval 19)>.
-
- =item *
-
- After each C<require>d file is compiled, but before it is executed,
- C<DB::postponed(*{"_<$filename"})> is called if the subroutine
- C<DB::postponed> exists. Here, the $filename is the expanded name of
- the C<require>d file, as found in the values of %INC.
-
- =item *
-
- After each subroutine C<subname> is compiled, the existence of
- C<$DB::postponed{subname}> is checked. If this key exists,
- C<DB::postponed(subname)> is called if the C<DB::postponed> subroutine
- also exists.
-
- =item *
-
- A hash C<%DB::sub> is maintained, whose keys are subroutine names
- and whose values have the form C<filename:startline-endline>.
- C<filename> has the form C<(eval 34)> for subroutines defined inside
- C<eval>s, or C<(re_eval 19)> for those within regex code assertions.
-
- =item *
-
- When the execution of your program reaches a point that can hold a
- breakpoint, the C<DB::DB()> subroutine is called if any of the variables
- C<$DB::trace>, C<$DB::single>, or C<$DB::signal> is true. These variables
- are not C<local>izable. This feature is disabled when executing
- inside C<DB::DB()>, including functions called from it
- unless C<< $^D & (1<<30) >> is true.
-
- =item *
-
- When execution of the program reaches a subroutine call, a call to
- C<&DB::sub>(I<args>) is made instead, with C<$DB::sub> holding the
- name of the called subroutine. (This doesn't happen if the subroutine
- was compiled in the C<DB> package.)
-
- =back
-
- Note that if C<&DB::sub> needs external data for it to work, no
- subroutine call is possible without it. As an example, the standard
- debugger's C<&DB::sub> depends on the C<$DB::deep> variable
- (it defines how many levels of recursion deep into the debugger you can go
- before a mandatory break). If C<$DB::deep> is not defined, subroutine
- calls are not possible, even though C<&DB::sub> exists.
-
- =head2 Writing Your Own Debugger
-
- =head3 Environment Variables
-
- The C<PERL5DB> environment variable can be used to define a debugger.
- For example, the minimal "working" debugger (it actually doesn't do anything)
- consists of one line:
-
- sub DB::DB {}
-
- It can easily be defined like this:
-
- $ PERL5DB="sub DB::DB {}" perl -d your-script
-
- Another brief debugger, slightly more useful, can be created
- with only the line:
-
- sub DB::DB {print ++$i; scalar <STDIN>}
-
- This debugger prints a number which increments for each statement
- encountered and waits for you to hit a newline before continuing
- to the next statement.
-
- The following debugger is actually useful:
-
- {
- package DB;
- sub DB {}
- sub sub {print ++$i, " $sub\n"; &$sub}
- }
-
- It prints the sequence number of each subroutine call and the name of the
- called subroutine. Note that C<&DB::sub> is being compiled into the
- package C<DB> through the use of the C<package> directive.
-
- When it starts, the debugger reads your rc file (F<./.perldb> or
- F<~/.perldb> under Unix), which can set important options.
- (A subroutine (C<&afterinit>) can be defined here as well; it is executed
- after the debugger completes its own initialization.)
-
- After the rc file is read, the debugger reads the PERLDB_OPTS
- environment variable and uses it to set debugger options. The
- contents of this variable are treated as if they were the argument
- of an C<o ...> debugger command (q.v. in L<perldebug/Options>).
-
- =head3 Debugger internal variables
- In addition to the file and subroutine-related variables mentioned above,
- the debugger also maintains various magical internal variables.
-
- =over 4
-
- =item *
-
- C<@DB::dbline> is an alias for C<@{"::_<current_file"}>, which
- holds the lines of the currently-selected file (compiled by Perl), either
- explicitly chosen with the debugger's C<f> command, or implicitly by flow
- of execution.
-
- Values in this array are magical in numeric context: they compare
- equal to zero only if the line is not breakable.
-
- =item *
-
- C<%DB::dbline>, is an alias for C<%{"::_<current_file"}>, which
- contains breakpoints and actions keyed by line number in
- the currently-selected file, either explicitly chosen with the
- debugger's C<f> command, or implicitly by flow of execution.
-
- As previously noted, individual entries (as opposed to the whole hash)
- are settable. Perl only cares about Boolean true here, although
- the values used by F<perl5db.pl> have the form
- C<"$break_condition\0$action">.
-
- =back
-
- =head3 Debugger customization functions
-
- Some functions are provided to simplify customization.
-
- =over 4
-
- =item *
-
- See L<perldebug/"Options"> for description of options parsed by
- C<DB::parse_options(string)> parses debugger options; see
- L<pperldebug/Options> for a description of options recognized.
-
- =item *
-
- C<DB::dump_trace(skip[,count])> skips the specified number of frames
- and returns a list containing information about the calling frames (all
- of them, if C<count> is missing). Each entry is reference to a hash
- with keys C<context> (either C<.>, C<$>, or C<@>), C<sub> (subroutine
- name, or info about C<eval>), C<args> (C<undef> or a reference to
- an array), C<file>, and C<line>.
-
- =item *
-
- C<DB::print_trace(FH, skip[, count[, short]])> prints
- formatted info about caller frames. The last two functions may be
- convenient as arguments to C<< < >>, C<< << >> commands.
-
- =back
-
- Note that any variables and functions that are not documented in
- this manpages (or in L<perldebug>) are considered for internal
- use only, and as such are subject to change without notice.
-
- =head1 Frame Listing Output Examples
-
- The C<frame> option can be used to control the output of frame
- information. For example, contrast this expression trace:
-
- $ perl -de 42
- Stack dump during die enabled outside of evals.
-
- Loading DB routines from perl5db.pl patch level 0.94
- Emacs support available.
-
- Enter h or `h h' for help.
-
- main::(-e:1): 0
- DB<1> sub foo { 14 }
-
- DB<2> sub bar { 3 }
-
- DB<3> t print foo() * bar()
- main::((eval 172):3): print foo() + bar();
- main::foo((eval 168):2):
- main::bar((eval 170):2):
- 42
-
- with this one, once the C<o>ption C<frame=2> has been set:
-
- DB<4> o f=2
- frame = '2'
- DB<5> t print foo() * bar()
- 3: foo() * bar()
- entering main::foo
- 2: sub foo { 14 };
- exited main::foo
- entering main::bar
- 2: sub bar { 3 };
- exited main::bar
- 42
-
- By way of demonstration, we present below a laborious listing
- resulting from setting your C<PERLDB_OPTS> environment variable to
- the value C<f=n N>, and running I<perl -d -V> from the command line.
- Examples use various values of C<n> are shown to give you a feel
- for the difference between settings. Long those it may be, this
- is not a complete listing, but only excerpts.
-
- =over 4
-
- =item 1
-
- entering main::BEGIN
- entering Config::BEGIN
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- Package lib/Config.pm.
- entering Config::TIEHASH
- entering Exporter::import
- entering Exporter::export
- entering Config::myconfig
- entering Config::FETCH
- entering Config::FETCH
- entering Config::FETCH
- entering Config::FETCH
-
- =item 2
-
- entering main::BEGIN
- entering Config::BEGIN
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- exited Config::BEGIN
- Package lib/Config.pm.
- entering Config::TIEHASH
- exited Config::TIEHASH
- entering Exporter::import
- entering Exporter::export
- exited Exporter::export
- exited Exporter::import
- exited main::BEGIN
- entering Config::myconfig
- entering Config::FETCH
- exited Config::FETCH
- entering Config::FETCH
- exited Config::FETCH
- entering Config::FETCH
-
- =item 4
-
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from li
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'osname') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'osvers') from lib/Config.pm:574
-
- =item 6
-
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- out $=Config::BEGIN() from lib/Config.pm:0
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- out $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- out $=main::BEGIN() from /dev/null:0
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'package') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'baserev') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- out $=Config::FETCH(ref(Config), 'PERL_VERSION') from lib/Config.pm:574
- in $=Config::FETCH(ref(Config), 'PERL_SUBVERSION') from lib/Config.pm:574
-
- =item 14
-
- in $=main::BEGIN() from /dev/null:0
- in $=Config::BEGIN() from lib/Config.pm:2
- Package lib/Exporter.pm.
- Package lib/Carp.pm.
- out $=Config::BEGIN() from lib/Config.pm:0
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:644
- out $=Config::TIEHASH('Config') from lib/Config.pm:644
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/E
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- out $=main::BEGIN() from /dev/null:0
- in @=Config::myconfig() from /dev/null:0
- in $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
- out $=Config::FETCH('Config=HASH(0x1aa444)', 'package') from lib/Config.pm:574
- in $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
- out $=Config::FETCH('Config=HASH(0x1aa444)', 'baserev') from lib/Config.pm:574
-
- =item 30
-
- in $=CODE(0x15eca4)() from /dev/null:0
- in $=CODE(0x182528)() from lib/Config.pm:2
- Package lib/Exporter.pm.
- out $=CODE(0x182528)() from lib/Config.pm:0
- scalar context return from CODE(0x182528): undef
- Package lib/Config.pm.
- in $=Config::TIEHASH('Config') from lib/Config.pm:628
- out $=Config::TIEHASH('Config') from lib/Config.pm:628
- scalar context return from Config::TIEHASH: empty hash
- in $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- in $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
- out $=Exporter::export('Config', 'main', 'myconfig', 'config_vars') from lib/Exporter.pm:171
- scalar context return from Exporter::export: ''
- out $=Exporter::import('Config', 'myconfig', 'config_vars') from /dev/null:0
- scalar context return from Exporter::import: ''
-
- =back
-
- In all cases shown above, the line indentation shows the call tree.
- If bit 2 of C<frame> is set, a line is printed on exit from a
- subroutine as well. If bit 4 is set, the arguments are printed
- along with the caller info. If bit 8 is set, the arguments are
- printed even if they are tied or references. If bit 16 is set, the
- return value is printed, too.
-
- When a package is compiled, a line like this
-
- Package lib/Carp.pm.
-
- is printed with proper indentation.
-
- =head1 Debugging regular expressions
-
- There are two ways to enable debugging output for regular expressions.
-
- If your perl is compiled with C<-DDEBUGGING>, you may use the
- B<-Dr> flag on the command line.
-
- Otherwise, one can C<use re 'debug'>, which has effects at
- compile time and run time. It is not lexically scoped.
-
- =head2 Compile-time output
-
- The debugging output at compile time looks like this:
-
- Compiling REx `[bc]d(ef*g)+h[ij]k$'
- size 45 Got 364 bytes for offset annotations.
- first at 1
- rarest char g at 0
- rarest char d at 0
- 1: ANYOF[bc](12)
- 12: EXACT <d>(14)
- 14: CURLYX[0] {1,32767}(28)
- 16: OPEN1(18)
- 18: EXACT <e>(20)
- 20: STAR(23)
- 21: EXACT <f>(0)
- 23: EXACT <g>(25)
- 25: CLOSE1(27)
- 27: WHILEM[1/1](0)
- 28: NOTHING(29)
- 29: EXACT <h>(31)
- 31: ANYOF[ij](42)
- 42: EXACT <k>(44)
- 44: EOL(45)
- 45: END(0)
- anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
- stclass `ANYOF[bc]' minlen 7
- Offsets: [45]
- 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
- 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
- 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
- 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
- Omitting $` $& $' support.
-
- The first line shows the pre-compiled form of the regex. The second
- shows the size of the compiled form (in arbitrary units, usually
- 4-byte words) and the total number of bytes allocated for the
- offset/length table, usually 4+C<size>*8. The next line shows the
- label I<id> of the first node that does a match.
-
- The
-
- anchored `de' at 1 floating `gh' at 3..2147483647 (checking floating)
- stclass `ANYOF[bc]' minlen 7
-
- line (split into two lines above) contains optimizer
- information. In the example shown, the optimizer found that the match
- should contain a substring C<de> at offset 1, plus substring C<gh>
- at some offset between 3 and infinity. Moreover, when checking for
- these substrings (to abandon impossible matches quickly), Perl will check
- for the substring C<gh> before checking for the substring C<de>. The
- optimizer may also use the knowledge that the match starts (at the
- C<first> I<id>) with a character class, and no string
- shorter than 7 characters can possibly match.
-
- The fields of interest which may appear in this line are
-
- =over 4
-
- =item C<anchored> I<STRING> C<at> I<POS>
-
- =item C<floating> I<STRING> C<at> I<POS1..POS2>
-
- See above.
-
- =item C<matching floating/anchored>
-
- Which substring to check first.
-
- =item C<minlen>
-
- The minimal length of the match.
-
- =item C<stclass> I<TYPE>
-
- Type of first matching node.
-
- =item C<noscan>
-
- Don't scan for the found substrings.
-
- =item C<isall>
-
- Means that the optimizer information is all that the regular
- expression contains, and thus one does not need to enter the regex engine at
- all.
-
- =item C<GPOS>
-
- Set if the pattern contains C<\G>.
-
- =item C<plus>
-
- Set if the pattern starts with a repeated char (as in C<x+y>).
-
- =item C<implicit>
-
- Set if the pattern starts with C<.*>.
-
- =item C<with eval>
-
- Set if the pattern contain eval-groups, such as C<(?{ code })> and
- C<(??{ code })>.
-
- =item C<anchored(TYPE)>
-
- If the pattern may match only at a handful of places, (with C<TYPE>
- being C<BOL>, C<MBOL>, or C<GPOS>. See the table below.
-
- =back
-
- If a substring is known to match at end-of-line only, it may be
- followed by C<$>, as in C<floating `k'$>.
-
- The optimizer-specific information is used to avoid entering (a slow) regex
- engine on strings that will not definitely match. If the C<isall> flag
- is set, a call to the regex engine may be avoided even when the optimizer
- found an appropriate place for the match.
-
- Above the optimizer section is the list of I<nodes> of the compiled
- form of the regex. Each line has format
-
- C< >I<id>: I<TYPE> I<OPTIONAL-INFO> (I<next-id>)
-
- =head2 Types of nodes
-
- Here are the possible types, with short descriptions:
-
- # TYPE arg-description [num-args] [longjump-len] DESCRIPTION
-
- # Exit points
- END no End of program.
- SUCCEED no Return from a subroutine, basically.
-
- # Anchors:
- BOL no Match "" at beginning of line.
- MBOL no Same, assuming multiline.
- SBOL no Same, assuming singleline.
- EOS no Match "" at end of string.
- EOL no Match "" at end of line.
- MEOL no Same, assuming multiline.
- SEOL no Same, assuming singleline.
- BOUND no Match "" at any word boundary
- BOUNDL no Match "" at any word boundary
- NBOUND no Match "" at any word non-boundary
- NBOUNDL no Match "" at any word non-boundary
- GPOS no Matches where last m//g left off.
-
- # [Special] alternatives
- ANY no Match any one character (except newline).
- SANY no Match any one character.
- ANYOF sv Match character in (or not in) this class.
- ALNUM no Match any alphanumeric character
- ALNUML no Match any alphanumeric char in locale
- NALNUM no Match any non-alphanumeric character
- NALNUML no Match any non-alphanumeric char in locale
- SPACE no Match any whitespace character
- SPACEL no Match any whitespace char in locale
- NSPACE no Match any non-whitespace character
- NSPACEL no Match any non-whitespace char in locale
- DIGIT no Match any numeric character
- NDIGIT no Match any non-numeric character
-
- # BRANCH The set of branches constituting a single choice are hooked
- # together with their "next" pointers, since precedence prevents
- # anything being concatenated to any individual branch. The
- # "next" pointer of the last BRANCH in a choice points to the
- # thing following the whole choice. This is also where the
- # final "next" pointer of each individual branch points; each
- # branch starts with the operand node of a BRANCH node.
- #
- BRANCH node Match this alternative, or the next...
-
- # BACK Normal "next" pointers all implicitly point forward; BACK
- # exists to make loop structures possible.
- # not used
- BACK no Match "", "next" ptr points backward.
-
- # Literals
- EXACT sv Match this string (preceded by length).
- EXACTF sv Match this string, folded (prec. by length).
- EXACTFL sv Match this string, folded in locale (w/len).
-
- # Do nothing
- NOTHING no Match empty string.
- # A variant of above which delimits a group, thus stops optimizations
- TAIL no Match empty string. Can jump here from outside.
-
- # STAR,PLUS '?', and complex '*' and '+', are implemented as circular
- # BRANCH structures using BACK. Simple cases (one character
- # per match) are implemented with STAR and PLUS for speed
- # and to minimize recursive plunges.
- #
- STAR node Match this (simple) thing 0 or more times.
- PLUS node Match this (simple) thing 1 or more times.
-
- CURLY sv 2 Match this simple thing {n,m} times.
- CURLYN no 2 Match next-after-this simple thing
- # {n,m} times, set parens.
- CURLYM no 2 Match this medium-complex thing {n,m} times.
- CURLYX sv 2 Match this complex thing {n,m} times.
-
- # This terminator creates a loop structure for CURLYX
- WHILEM no Do curly processing and see if rest matches.
-
- # OPEN,CLOSE,GROUPP ...are numbered at compile time.
- OPEN num 1 Mark this point in input as start of #n.
- CLOSE num 1 Analogous to OPEN.
-
- REF num 1 Match some already matched string
- REFF num 1 Match already matched string, folded
- REFFL num 1 Match already matched string, folded in loc.
-
- # grouping assertions
- IFMATCH off 1 2 Succeeds if the following matches.
- UNLESSM off 1 2 Fails if the following matches.
- SUSPEND off 1 1 "Independent" sub-regex.
- IFTHEN off 1 1 Switch, should be preceded by switcher .
- GROUPP num 1 Whether the group matched.
-
- # Support for long regex
- LONGJMP off 1 1 Jump far away.
- BRANCHJ off 1 1 BRANCH with long offset.
-
- # The heavy worker
- EVAL evl 1 Execute some Perl code.
-
- # Modifiers
- MINMOD no Next operator is not greedy.
- LOGICAL no Next opcode should set the flag only.
-
- # This is not used yet
- RENUM off 1 1 Group with independently numbered parens.
-
- # This is not really a node, but an optimized away piece of a "long" node.
- # To simplify debugging output, we mark it as if it were a node
- OPTIMIZED off Placeholder for dump.
-
- =for unprinted-credits
- Next section M-J. Dominus (mjd-perl-patch+@plover.com) 20010421
-
- Following the optimizer information is a dump of the offset/length
- table, here split across several lines:
-
- Offsets: [45]
- 1[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 5[1]
- 0[0] 12[1] 0[0] 6[1] 0[0] 7[1] 0[0] 9[1] 8[1] 0[0] 10[1] 0[0]
- 11[1] 0[0] 12[0] 12[0] 13[1] 0[0] 14[4] 0[0] 0[0] 0[0] 0[0]
- 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 18[1] 0[0] 19[1] 20[0]
-
- The first line here indicates that the offset/length table contains 45
- entries. Each entry is a pair of integers, denoted by C<offset[length]>.
- Entries are numbered starting with 1, so entry #1 here is C<1[4]> and
- entry #12 is C<5[1]>. C<1[4]> indicates that the node labeled C<1:>
- (the C<1: ANYOF[bc]>) begins at character position 1 in the
- pre-compiled form of the regex, and has a length of 4 characters.
- C<5[1]> in position 12
- indicates that the node labeled C<12:>
- (the C<< 12: EXACT <d> >>) begins at character position 5 in the
- pre-compiled form of the regex, and has a length of 1 character.
- C<12[1]> in position 14
- indicates that the node labeled C<14:>
- (the C<< 14: CURLYX[0] {1,32767} >>) begins at character position 12 in the
- pre-compiled form of the regex, and has a length of 1 character---that
- is, it corresponds to the C<+> symbol in the precompiled regex.
-
- C<0[0]> items indicate that there is no corresponding node.
-
- =head2 Run-time output
-
- First of all, when doing a match, one may get no run-time output even
- if debugging is enabled. This means that the regex engine was never
- entered and that all of the job was therefore done by the optimizer.
-
- If the regex engine was entered, the output may look like this:
-
- Matching `[bc]d(ef*g)+h[ij]k$' against `abcdefg__gh__'
- Setting an EVAL scope, savestack=3
- 2 <ab> <cdefg__gh_> | 1: ANYOF
- 3 <abc> <defg__gh_> | 11: EXACT <d>
- 4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}
- 4 <abcd> <efg__gh_> | 26: WHILEM
- 0 out of 1..32767 cc=effff31c
- 4 <abcd> <efg__gh_> | 15: OPEN1
- 4 <abcd> <efg__gh_> | 17: EXACT <e>
- 5 <abcde> <fg__gh_> | 19: STAR
- EXACT <f> can match 1 times out of 32767...
- Setting an EVAL scope, savestack=3
- 6 <bcdef> <g__gh__> | 22: EXACT <g>
- 7 <bcdefg> <__gh__> | 24: CLOSE1
- 7 <bcdefg> <__gh__> | 26: WHILEM
- 1 out of 1..32767 cc=effff31c
- Setting an EVAL scope, savestack=12
- 7 <bcdefg> <__gh__> | 15: OPEN1
- 7 <bcdefg> <__gh__> | 17: EXACT <e>
- restoring \1 to 4(4)..7
- failed, try continuation...
- 7 <bcdefg> <__gh__> | 27: NOTHING
- 7 <bcdefg> <__gh__> | 28: EXACT <h>
- failed...
- failed...
-
- The most significant information in the output is about the particular I<node>
- of the compiled regex that is currently being tested against the target string.
- The format of these lines is
-
- C< >I<STRING-OFFSET> <I<PRE-STRING>> <I<POST-STRING>> |I<ID>: I<TYPE>
-
- The I<TYPE> info is indented with respect to the backtracking level.
- Other incidental information appears interspersed within.
-
- =head1 Debugging Perl memory usage
-
- Perl is a profligate wastrel when it comes to memory use. There
- is a saying that to estimate memory usage of Perl, assume a reasonable
- algorithm for memory allocation, multiply that estimate by 10, and
- while you still may miss the mark, at least you won't be quite so
- astonished. This is not absolutely true, but may provide a good
- grasp of what happens.
-
- Assume that an integer cannot take less than 20 bytes of memory, a
- float cannot take less than 24 bytes, a string cannot take less
- than 32 bytes (all these examples assume 32-bit architectures, the
- result are quite a bit worse on 64-bit architectures). If a variable
- is accessed in two of three different ways (which require an integer,
- a float, or a string), the memory footprint may increase yet another
- 20 bytes. A sloppy malloc(3) implementation can inflate these
- numbers dramatically.
-
- On the opposite end of the scale, a declaration like
-
- sub foo;
-
- may take up to 500 bytes of memory, depending on which release of Perl
- you're running.
-
- Anecdotal estimates of source-to-compiled code bloat suggest an
- eightfold increase. This means that the compiled form of reasonable
- (normally commented, properly indented etc.) code will take
- about eight times more space in memory than the code took
- on disk.
-
- There are two Perl-specific ways to analyze memory usage:
- $ENV{PERL_DEBUG_MSTATS} and B<-DL> command-line switch. The first
- is available only if Perl is compiled with Perl's malloc(); the
- second only if Perl was built with C<-DDEBUGGING>. See the
- instructions for how to do this in the F<INSTALL> podpage at
- the top level of the Perl source tree.
-
- =head2 Using C<$ENV{PERL_DEBUG_MSTATS}>
-
- If your perl is using Perl's malloc() and was compiled with the
- necessary switches (this is the default), then it will print memory
- usage statistics after compiling your code when C<< $ENV{PERL_DEBUG_MSTATS}
- > 1 >>, and before termination of the program when C<<
- $ENV{PERL_DEBUG_MSTATS} >= 1 >>. The report format is similar to
- the following example:
-
- $ PERL_DEBUG_MSTATS=2 perl -e "require Carp"
- Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)
- 14216 free: 130 117 28 7 9 0 2 2 1 0 0
- 437 61 36 0 5
- 60924 used: 125 137 161 55 7 8 6 16 2 0 1
- 74 109 304 84 20
- Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.
- Memory allocation statistics after execution: (buckets 4(4)..8188(8192)
- 30888 free: 245 78 85 13 6 2 1 3 2 0 1
- 315 162 39 42 11
- 175816 used: 265 176 1112 111 26 22 11 27 2 1 1
- 196 178 1066 798 39
- Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.
-
- It is possible to ask for such a statistic at arbitrary points in
- your execution using the mstat() function out of the standard
- Devel::Peek module.
-
- Here is some explanation of that format:
-
- =over 4
-
- =item C<buckets SMALLEST(APPROX)..GREATEST(APPROX)>
-
- Perl's malloc() uses bucketed allocations. Every request is rounded
- up to the closest bucket size available, and a bucket is taken from
- the pool of buckets of that size.
-
- The line above describes the limits of buckets currently in use.
- Each bucket has two sizes: memory footprint and the maximal size
- of user data that can fit into this bucket. Suppose in the above
- example that the smallest bucket were size 4. The biggest bucket
- would have usable size 8188, and the memory footprint would be 8192.
-
- In a Perl built for debugging, some buckets may have negative usable
- size. This means that these buckets cannot (and will not) be used.
- For larger buckets, the memory footprint may be one page greater
- than a power of 2. If so, case the corresponding power of two is
- printed in the C<APPROX> field above.
-
- =item Free/Used
-
- The 1 or 2 rows of numbers following that correspond to the number
- of buckets of each size between C<SMALLEST> and C<GREATEST>. In
- the first row, the sizes (memory footprints) of buckets are powers
- of two--or possibly one page greater. In the second row, if present,
- the memory footprints of the buckets are between the memory footprints
- of two buckets "above".
-
- For example, suppose under the previous example, the memory footprints
- were
-
- free: 8 16 32 64 128 256 512 1024 2048 4096 8192
- 4 12 24 48 80
-
- With non-C<DEBUGGING> perl, the buckets starting from C<128> have
- a 4-byte overhead, and thus an 8192-long bucket may take up to
- 8188-byte allocations.
-
- =item C<Total sbrk(): SBRKed/SBRKs:CONTINUOUS>
-
- The first two fields give the total amount of memory perl sbrk(2)ed
- (ess-broken? :-) and number of sbrk(2)s used. The third number is
- what perl thinks about continuity of returned chunks. So long as
- this number is positive, malloc() will assume that it is probable
- that sbrk(2) will provide continuous memory.
-
- Memory allocated by external libraries is not counted.
-
- =item C<pad: 0>
-
- The amount of sbrk(2)ed memory needed to keep buckets aligned.
-
- =item C<heads: 2192>
-
- Although memory overhead of bigger buckets is kept inside the bucket, for
- smaller buckets, it is kept in separate areas. This field gives the
- total size of these areas.
-
- =item C<chain: 0>
-
- malloc() may want to subdivide a bigger bucket into smaller buckets.
- If only a part of the deceased bucket is left unsubdivided, the rest
- is kept as an element of a linked list. This field gives the total
- size of these chunks.
-
- =item C<tail: 6144>
-
- To minimize the number of sbrk(2)s, malloc() asks for more memory. This
- field gives the size of the yet unused part, which is sbrk(2)ed, but
- never touched.
-
- =back
-
- =head2 Example of using B<-DL> switch
-
- Below we show how to analyse memory usage by
-
- do 'lib/auto/POSIX/autosplit.ix';
-
- The file in question contains a header and 146 lines similar to
-
- sub getcwd;
-
- B<WARNING>: The discussion below supposes 32-bit architecture. In
- newer releases of Perl, memory usage of the constructs discussed
- here is greatly improved, but the story discussed below is a real-life
- story. This story is mercilessly terse, and assumes rather more than cursory
- knowledge of Perl internals. Type space to continue, `q' to quit.
- (Actually, you just want to skip to the next section.)
-
- Here is the itemized list of Perl allocations performed during parsing
- of this file:
-
- !!! "after" at test.pl line 3.
- Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+
- 0 02 13752 . . . . 294 . . . . . . . . . . 4
- 0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3
- 5 05 32 . . . . . . . 1 . . . . . . . .
- 6 02 7152 . . . . . . . . . . 149 . . . . .
- 7 02 3600 . . . . . 150 . . . . . . . . . .
- 7 03 64 . -1 . 1 . . 2 . . . . . . . . .
- 7 04 7056 . . . . . . . . . . . . . . . 7
- 7 17 38404 . . . . . . . 1 . . 442 149 . . 147 .
- 9 03 2078 17 249 32 . . . . 2 . . . . . . . .
-
-
- To see this list, insert two C<warn('!...')> statements around the call:
-
- warn('!');
- do 'lib/auto/POSIX/autosplit.ix';
- warn('!!! "after"');
-
- and run it with Perl's B<-DL> option. The first warn() will print
- memory allocation info before parsing the file and will memorize
- the statistics at this point (we ignore what it prints). The second
- warn() prints increments with respect to these memorized data. This
- is the printout shown above.
-
- Different I<Id>s on the left correspond to different subsystems of
- the perl interpreter. They are just the first argument given to
- the perl memory allocation API named New(). To find what C<9 03>
- means, just B<grep> the perl source for C<903>. You'll find it in
- F<util.c>, function savepvn(). (I know, you wonder why we told you
- to B<grep> and then gave away the answer. That's because grepping
- the source is good for the soul.) This function is used to store
- a copy of an existing chunk of memory. Using a C debugger, one can
- see that the function was called either directly from gv_init() or
- via sv_magic(), and that gv_init() is called from gv_fetchpv()--which
- was itself called from newSUB(). Please stop to catch your breath now.
-
- B<NOTE>: To reach this point in the debugger and skip the calls to
- savepvn() during the compilation of the main program, you should
- set a C breakpoint
- in Perl_warn(), continue until this point is reached, and I<then> set
- a C breakpoint in Perl_savepvn(). Note that you may need to skip a
- handful of Perl_savepvn() calls that do not correspond to mass production
- of CVs (there are more C<903> allocations than 146 similar lines of
- F<lib/auto/POSIX/autosplit.ix>). Note also that C<Perl_> prefixes are
- added by macroization code in perl header files to avoid conflicts
- with external libraries.
-
- Anyway, we see that C<903> ids correspond to creation of globs, twice
- per glob - for glob name, and glob stringification magic.
-
- Here are explanations for other I<Id>s above:
-
- =over 4
-
- =item C<717>
-
- Creates bigger C<XPV*> structures. In the case above, it
- creates 3 C<AV>s per subroutine, one for a list of lexical variable
- names, one for a scratchpad (which contains lexical variables and
- C<targets>), and one for the array of scratchpads needed for
- recursion.
-
- It also creates a C<GV> and a C<CV> per subroutine, all called from
- start_subparse().
-
- =item C<002>
-
- Creates a C array corresponding to the C<AV> of scratchpads and the
- scratchpad itself. The first fake entry of this scratchpad is
- created though the subroutine itself is not defined yet.
-
- It also creates C arrays to keep data for the stash. This is one HV,
- but it grows; thus, there are 4 big allocations: the big chunks are not
- freed, but are kept as additional arenas for C<SV> allocations.
-
- =item C<054>
-
- Creates a C<HEK> for the name of the glob for the subroutine. This
- name is a key in a I<stash>.
-
- Big allocations with this I<Id> correspond to allocations of new
- arenas to keep C<HE>.
-
- =item C<602>
-
- Creates a C<GP> for the glob for the subroutine.
-
- =item C<702>
-
- Creates the C<MAGIC> for the glob for the subroutine.
-
- =item C<704>
-
- Creates I<arenas> which keep SVs.
-
- =back
-
- =head2 B<-DL> details
-
- If Perl is run with B<-DL> option, then warn()s that start with `!'
- behave specially. They print a list of I<categories> of memory
- allocations, and statistics of allocations of different sizes for
- these categories.
-
- If warn() string starts with
-
- =over 4
-
- =item C<!!!>
-
- print changed categories only, print the differences in counts of allocations.
-
- =item C<!!>
-
- print grown categories only; print the absolute values of counts, and totals.
-
- =item C<!>
-
- print nonempty categories, print the absolute values of counts and totals.
-
- =back
-
- =head2 Limitations of B<-DL> statistics
-
- If an extension or external library does not use the Perl API to
- allocate memory, such allocations are not counted.
-
- =head1 SEE ALSO
-
- L<perldebug>,
- L<perlguts>,
- L<perlrun>
- L<re>,
- and
- L<Devel::DProf>.
-