home *** CD-ROM | disk | FTP | other *** search
Text File | 2002-07-13 | 88.1 KB | 2,359 lines |
- =head1 NAME
-
- perlhack - How to hack at the Perl internals
-
- =head1 DESCRIPTION
-
- This document attempts to explain how Perl development takes place,
- and ends with some suggestions for people wanting to become bona fide
- porters.
-
- The perl5-porters mailing list is where the Perl standard distribution
- is maintained and developed. The list can get anywhere from 10 to 150
- messages a day, depending on the heatedness of the debate. Most days
- there are two or three patches, extensions, features, or bugs being
- discussed at a time.
-
- A searchable archive of the list is at either:
-
- http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
-
- or
-
- http://archive.develooper.com/perl5-porters@perl.org/
-
- List subscribers (the porters themselves) come in several flavours.
- Some are quiet curious lurkers, who rarely pitch in and instead watch
- the ongoing development to ensure they're forewarned of new changes or
- features in Perl. Some are representatives of vendors, who are there
- to make sure that Perl continues to compile and work on their
- platforms. Some patch any reported bug that they know how to fix,
- some are actively patching their pet area (threads, Win32, the regexp
- engine), while others seem to do nothing but complain. In other
- words, it's your usual mix of technical people.
-
- Over this group of porters presides Larry Wall. He has the final word
- in what does and does not change in the Perl language. Various
- releases of Perl are shepherded by a ``pumpking'', a porter
- responsible for gathering patches, deciding on a patch-by-patch
- feature-by-feature basis what will and will not go into the release.
- For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of
- Perl, and Jarkko Hietaniemi is the pumpking for the 5.8 release, and
- Hugo van der Sanden will be the pumpking for the 5.10 release.
-
- In addition, various people are pumpkings for different things. For
- instance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure>
- pumpkin.
-
- Larry sees Perl development along the lines of the US government:
- there's the Legislature (the porters), the Executive branch (the
- pumpkings), and the Supreme Court (Larry). The legislature can
- discuss and submit patches to the executive branch all they like, but
- the executive branch is free to veto them. Rarely, the Supreme Court
- will side with the executive branch over the legislature, or the
- legislature over the executive branch. Mostly, however, the
- legislature and the executive branch are supposed to get along and
- work out their differences without impeachment or court cases.
-
- You might sometimes see reference to Rule 1 and Rule 2. Larry's power
- as Supreme Court is expressed in The Rules:
-
- =over 4
-
- =item 1
-
- Larry is always by definition right about how Perl should behave.
- This means he has final veto power on the core functionality.
-
- =item 2
-
- Larry is allowed to change his mind about any matter at a later date,
- regardless of whether he previously invoked Rule 1.
-
- =back
-
- Got that? Larry is always right, even when he was wrong. It's rare
- to see either Rule exercised, but they are often alluded to.
-
- New features and extensions to the language are contentious, because
- the criteria used by the pumpkings, Larry, and other porters to decide
- which features should be implemented and incorporated are not codified
- in a few small design goals as with some other languages. Instead,
- the heuristics are flexible and often difficult to fathom. Here is
- one person's list, roughly in decreasing order of importance, of
- heuristics that new features have to be weighed against:
-
- =over 4
-
- =item Does concept match the general goals of Perl?
-
- These haven't been written anywhere in stone, but one approximation
- is:
-
- 1. Keep it fast, simple, and useful.
- 2. Keep features/concepts as orthogonal as possible.
- 3. No arbitrary limits (platforms, data sizes, cultures).
- 4. Keep it open and exciting to use/patch/advocate Perl everywhere.
- 5. Either assimilate new technologies, or build bridges to them.
-
- =item Where is the implementation?
-
- All the talk in the world is useless without an implementation. In
- almost every case, the person or people who argue for a new feature
- will be expected to be the ones who implement it. Porters capable
- of coding new features have their own agendas, and are not available
- to implement your (possibly good) idea.
-
- =item Backwards compatibility
-
- It's a cardinal sin to break existing Perl programs. New warnings are
- contentious--some say that a program that emits warnings is not
- broken, while others say it is. Adding keywords has the potential to
- break programs, changing the meaning of existing token sequences or
- functions might break programs.
-
- =item Could it be a module instead?
-
- Perl 5 has extension mechanisms, modules and XS, specifically to avoid
- the need to keep changing the Perl interpreter. You can write modules
- that export functions, you can give those functions prototypes so they
- can be called like built-in functions, you can even write XS code to
- mess with the runtime data structures of the Perl interpreter if you
- want to implement really complicated things. If it can be done in a
- module instead of in the core, it's highly unlikely to be added.
-
- =item Is the feature generic enough?
-
- Is this something that only the submitter wants added to the language,
- or would it be broadly useful? Sometimes, instead of adding a feature
- with a tight focus, the porters might decide to wait until someone
- implements the more generalized feature. For instance, instead of
- implementing a ``delayed evaluation'' feature, the porters are waiting
- for a macro system that would permit delayed evaluation and much more.
-
- =item Does it potentially introduce new bugs?
-
- Radical rewrites of large chunks of the Perl interpreter have the
- potential to introduce new bugs. The smaller and more localized the
- change, the better.
-
- =item Does it preclude other desirable features?
-
- A patch is likely to be rejected if it closes off future avenues of
- development. For instance, a patch that placed a true and final
- interpretation on prototypes is likely to be rejected because there
- are still options for the future of prototypes that haven't been
- addressed.
-
- =item Is the implementation robust?
-
- Good patches (tight code, complete, correct) stand more chance of
- going in. Sloppy or incorrect patches might be placed on the back
- burner until the pumpking has time to fix, or might be discarded
- altogether without further notice.
-
- =item Is the implementation generic enough to be portable?
-
- The worst patches make use of a system-specific features. It's highly
- unlikely that nonportable additions to the Perl language will be
- accepted.
-
- =item Is the implementation tested?
-
- Patches which change behaviour (fixing bugs or introducing new features)
- must include regression tests to verify that everything works as expected.
- Without tests provided by the original author, how can anyone else changing
- perl in the future be sure that they haven't unwittingly broken the behaviour
- the patch implements? And without tests, how can the patch's author be
- confident that his/her hard work put into the patch won't be accidentally
- thrown away by someone in the future?
-
- =item Is there enough documentation?
-
- Patches without documentation are probably ill-thought out or
- incomplete. Nothing can be added without documentation, so submitting
- a patch for the appropriate manpages as well as the source code is
- always a good idea.
-
- =item Is there another way to do it?
-
- Larry said ``Although the Perl Slogan is I<There's More Than One Way
- to Do It>, I hesitate to make 10 ways to do something''. This is a
- tricky heuristic to navigate, though--one man's essential addition is
- another man's pointless cruft.
-
- =item Does it create too much work?
-
- Work for the pumpking, work for Perl programmers, work for module
- authors, ... Perl is supposed to be easy.
-
- =item Patches speak louder than words
-
- Working code is always preferred to pie-in-the-sky ideas. A patch to
- add a feature stands a much higher chance of making it to the language
- than does a random feature request, no matter how fervently argued the
- request might be. This ties into ``Will it be useful?'', as the fact
- that someone took the time to make the patch demonstrates a strong
- desire for the feature.
-
- =back
-
- If you're on the list, you might hear the word ``core'' bandied
- around. It refers to the standard distribution. ``Hacking on the
- core'' means you're changing the C source code to the Perl
- interpreter. ``A core module'' is one that ships with Perl.
-
- =head2 Keeping in sync
-
- The source code to the Perl interpreter, in its different versions, is
- kept in a repository managed by a revision control system ( which is
- currently the Perforce program, see http://perforce.com/ ). The
- pumpkings and a few others have access to the repository to check in
- changes. Periodically the pumpking for the development version of Perl
- will release a new version, so the rest of the porters can see what's
- changed. The current state of the main trunk of repository, and patches
- that describe the individual changes that have happened since the last
- public release are available at this location:
-
- http://public.activestate.com/gsar/APC/
- ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/
-
- If you're looking for a particular change, or a change that affected
- a particular set of files, you may find the B<Perl Repository Browser>
- useful:
-
- http://public.activestate.com/cgi-bin/perlbrowse
-
- You may also want to subscribe to the perl5-changes mailing list to
- receive a copy of each patch that gets submitted to the maintenance
- and development "branches" of the perl repository. See
- http://lists.perl.org/ for subscription information.
-
- If you are a member of the perl5-porters mailing list, it is a good
- thing to keep in touch with the most recent changes. If not only to
- verify if what you would have posted as a bug report isn't already
- solved in the most recent available perl development branch, also
- known as perl-current, bleading edge perl, bleedperl or bleadperl.
-
- Needless to say, the source code in perl-current is usually in a perpetual
- state of evolution. You should expect it to be very buggy. Do B<not> use
- it for any purpose other than testing and development.
-
- Keeping in sync with the most recent branch can be done in several ways,
- but the most convenient and reliable way is using B<rsync>, available at
- ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent
- branch by FTP.)
-
- If you choose to keep in sync using rsync, there are two approaches
- to doing so:
-
- =over 4
-
- =item rsync'ing the source tree
-
- Presuming you are in the directory where your perl source resides
- and you have rsync installed and available, you can `upgrade' to
- the bleadperl using:
-
- # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ .
-
- This takes care of updating every single item in the source tree to
- the latest applied patch level, creating files that are new (to your
- distribution) and setting date/time stamps of existing files to
- reflect the bleadperl status.
-
- Note that this will not delete any files that were in '.' before
- the rsync. Once you are sure that the rsync is running correctly,
- run it with the --delete and the --dry-run options like this:
-
- # rsync -avz --delete --dry-run rsync://ftp.linux.activestate.com/perl-current/ .
-
- This will I<simulate> an rsync run that also deletes files not
- present in the bleadperl master copy. Observe the results from
- this run closely. If you are sure that the actual run would delete
- no files precious to you, you could remove the '--dry-run' option.
-
- You can than check what patch was the latest that was applied by
- looking in the file B<.patch>, which will show the number of the
- latest patch.
-
- If you have more than one machine to keep in sync, and not all of
- them have access to the WAN (so you are not able to rsync all the
- source trees to the real source), there are some ways to get around
- this problem.
-
- =over 4
-
- =item Using rsync over the LAN
-
- Set up a local rsync server which makes the rsynced source tree
- available to the LAN and sync the other machines against this
- directory.
-
- From http://rsync.samba.org/README.html :
-
- "Rsync uses rsh or ssh for communication. It does not need to be
- setuid and requires no special privileges for installation. It
- does not require an inetd entry or a daemon. You must, however,
- have a working rsh or ssh system. Using ssh is recommended for
- its security features."
-
- =item Using pushing over the NFS
-
- Having the other systems mounted over the NFS, you can take an
- active pushing approach by checking the just updated tree against
- the other not-yet synced trees. An example would be
-
- #!/usr/bin/perl -w
-
- use strict;
- use File::Copy;
-
- my %MF = map {
- m/(\S+)/;
- $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
- } `cat MANIFEST`;
-
- my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
-
- foreach my $host (keys %remote) {
- unless (-d $remote{$host}) {
- print STDERR "Cannot Xsync for host $host\n";
- next;
- }
- foreach my $file (keys %MF) {
- my $rfile = "$remote{$host}/$file";
- my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
- defined $size or ($mode, $size, $mtime) = (0, 0, 0);
- $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
- printf "%4s %-34s %8d %9d %8d %9d\n",
- $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
- unlink $rfile;
- copy ($file, $rfile);
- utime time, $MF{$file}[2], $rfile;
- chmod $MF{$file}[0], $rfile;
- }
- }
-
- though this is not perfect. It could be improved with checking
- file checksums before updating. Not all NFS systems support
- reliable utime support (when used over the NFS).
-
- =back
-
- =item rsync'ing the patches
-
- The source tree is maintained by the pumpking who applies patches to
- the files in the tree. These patches are either created by the
- pumpking himself using C<diff -c> after updating the file manually or
- by applying patches sent in by posters on the perl5-porters list.
- These patches are also saved and rsync'able, so you can apply them
- yourself to the source files.
-
- Presuming you are in a directory where your patches reside, you can
- get them in sync with
-
- # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
-
- This makes sure the latest available patch is downloaded to your
- patch directory.
-
- It's then up to you to apply these patches, using something like
-
- # last=`ls -t *.gz | sed q`
- # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ .
- # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
- # cd ../perl-current
- # patch -p1 -N <../perl-current-diffs/blead.patch
-
- or, since this is only a hint towards how it works, use CPAN-patchaperl
- from Andreas K÷nig to have better control over the patching process.
-
- =back
-
- =head2 Why rsync the source tree
-
- =over 4
-
- =item It's easier to rsync the source tree
-
- Since you don't have to apply the patches yourself, you are sure all
- files in the source tree are in the right state.
-
- =item It's more reliable
-
- While both the rsync-able source and patch areas are automatically
- updated every few minutes, keep in mind that applying patches may
- sometimes mean careful hand-holding, especially if your version of
- the C<patch> program does not understand how to deal with new files,
- files with 8-bit characters, or files without trailing newlines.
-
- =back
-
- =head2 Why rsync the patches
-
- =over 4
-
- =item It's easier to rsync the patches
-
- If you have more than one machine that you want to keep in track with
- bleadperl, it's easier to rsync the patches only once and then apply
- them to all the source trees on the different machines.
-
- In case you try to keep in pace on 5 different machines, for which
- only one of them has access to the WAN, rsync'ing all the source
- trees should than be done 5 times over the NFS. Having
- rsync'ed the patches only once, I can apply them to all the source
- trees automatically. Need you say more ;-)
-
- =item It's a good reference
-
- If you do not only like to have the most recent development branch,
- but also like to B<fix> bugs, or extend features, you want to dive
- into the sources. If you are a seasoned perl core diver, you don't
- need no manuals, tips, roadmaps, perlguts.pod or other aids to find
- your way around. But if you are a starter, the patches may help you
- in finding where you should start and how to change the bits that
- bug you.
-
- The file B<Changes> is updated on occasions the pumpking sees as his
- own little sync points. On those occasions, he releases a tar-ball of
- the current source tree (i.e. perl@7582.tar.gz), which will be an
- excellent point to start with when choosing to use the 'rsync the
- patches' scheme. Starting with perl@7582, which means a set of source
- files on which the latest applied patch is number 7582, you apply all
- succeeding patches available from then on (7583, 7584, ...).
-
- You can use the patches later as a kind of search archive.
-
- =over 4
-
- =item Finding a start point
-
- If you want to fix/change the behaviour of function/feature Foo, just
- scan the patches for patches that mention Foo either in the subject,
- the comments, or the body of the fix. A good chance the patch shows
- you the files that are affected by that patch which are very likely
- to be the starting point of your journey into the guts of perl.
-
- =item Finding how to fix a bug
-
- If you've found I<where> the function/feature Foo misbehaves, but you
- don't know how to fix it (but you do know the change you want to
- make), you can, again, peruse the patches for similar changes and
- look how others apply the fix.
-
- =item Finding the source of misbehaviour
-
- When you keep in sync with bleadperl, the pumpking would love to
- I<see> that the community efforts really work. So after each of his
- sync points, you are to 'make test' to check if everything is still
- in working order. If it is, you do 'make ok', which will send an OK
- report to perlbug@perl.org. (If you do not have access to a mailer
- from the system you just finished successfully 'make test', you can
- do 'make okfile', which creates the file C<perl.ok>, which you can
- than take to your favourite mailer and mail yourself).
-
- But of course, as always, things will not always lead to a success
- path, and one or more test do not pass the 'make test'. Before
- sending in a bug report (using 'make nok' or 'make nokfile'), check
- the mailing list if someone else has reported the bug already and if
- so, confirm it by replying to that message. If not, you might want to
- trace the source of that misbehaviour B<before> sending in the bug,
- which will help all the other porters in finding the solution.
-
- Here the saved patches come in very handy. You can check the list of
- patches to see which patch changed what file and what change caused
- the misbehaviour. If you note that in the bug report, it saves the
- one trying to solve it, looking for that point.
-
- =back
-
- If searching the patches is too bothersome, you might consider using
- perl's bugtron to find more information about discussions and
- ramblings on posted bugs.
-
- If you want to get the best of both worlds, rsync both the source
- tree for convenience, reliability and ease and rsync the patches
- for reference.
-
- =back
-
-
- =head2 Perlbug remote interface
-
- There are three (3) remote administrative interfaces for modifying bug
- status, category, etc. In all cases an admin must be first registered
- with the Perlbug database by sending an email request to
- richard@perl.org or bugmongers@perl.org.
-
- The main requirement is the willingness to classify, (with the
- emphasis on closing where possible :), outstanding bugs. Further
- explanation can be garnered from the web at http://bugs.perl.org/ , or
- by asking on the admin mailing list at: bugmongers@perl.org
-
- For more info on the web see
-
- http://bugs.perl.org/perlbug.cgi?req=spec
-
- =over 4
-
- =item 1 http://bugs.perl.org
-
- Login via the web, (remove B<admin/> if only browsing), where interested
- Cc's, tests, patches and change-ids, etc. may be assigned.
-
- http://bugs.perl.org/admin/index.html
-
-
- =item 2 bugdb@perl.org
-
- Where the subject line is used for commands:
-
- To: bugdb@perl.org
- Subject: -a close bugid1 bugid2 aix install
-
- To: bugdb@perl.org
- Subject: -h
-
-
- =item 3 commands_and_bugdids@bugs.perl.org
-
- Where the address itself is the source for the commands:
-
- To: close_bugid1_bugid2_aix@bugs.perl.org
-
- To: help@bugs.perl.org
-
-
- =item notes, patches, tests
-
- For patches and tests, the message body is assigned to the appropriate
- bugs and forwarded to p5p for their attention.
-
- To: test_<bugid1>_aix_close@bugs.perl.org
- Subject: this is a test for the (now closed) aix bug
-
- Test is the body of the mail
-
- =back
-
- =head2 Submitting patches
-
- Always submit patches to I<perl5-porters@perl.org>. If you're
- patching a core module and there's an author listed, send the author a
- copy (see L<Patching a core module>). This lets other porters review
- your patch, which catches a surprising number of errors in patches.
- Either use the diff program (available in source code form from
- ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' I<makepatch>
- (available from I<CPAN/authors/id/JV/>). Unified diffs are preferred,
- but context diffs are accepted. Do not send RCS-style diffs or diffs
- without context lines. More information is given in the
- I<Porting/patching.pod> file in the Perl source distribution. Please
- patch against the latest B<development> version (e.g., if you're
- fixing a bug in the 5.005 track, patch against the latest 5.005_5x
- version). Only patches that survive the heat of the development
- branch get applied to maintenance versions.
-
- Your patch should update the documentation and test suite. See
- L<Writing a test>.
-
- To report a bug in Perl, use the program I<perlbug> which comes with
- Perl (if you can't get Perl to work, send mail to the address
- I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through
- I<perlbug> feeds into the automated bug-tracking system, access to
- which is provided through the web at http://bugs.perl.org/ . It
- often pays to check the archives of the perl5-porters mailing list to
- see whether the bug you're reporting has been reported before, and if
- so whether it was considered a bug. See above for the location of
- the searchable archives.
-
- The CPAN testers ( http://testers.cpan.org/ ) are a group of
- volunteers who test CPAN modules on a variety of platforms. Perl
- Smokers ( http://archives.develooper.com/daily-build@perl.org/ )
- automatically tests Perl source releases on platforms with various
- configurations. Both efforts welcome volunteers.
-
- It's a good idea to read and lurk for a while before chipping in.
- That way you'll get to see the dynamic of the conversations, learn the
- personalities of the players, and hopefully be better prepared to make
- a useful contribution when do you speak up.
-
- If after all this you still think you want to join the perl5-porters
- mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To
- unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>.
-
- To hack on the Perl guts, you'll need to read the following things:
-
- =over 3
-
- =item L<perlguts>
-
- This is of paramount importance, since it's the documentation of what
- goes where in the Perl source. Read it over a couple of times and it
- might start to make sense - don't worry if it doesn't yet, because the
- best way to study it is to read it in conjunction with poking at Perl
- source, and we'll do that later on.
-
- You might also want to look at Gisle Aas's illustrated perlguts -
- there's no guarantee that this will be absolutely up-to-date with the
- latest documentation in the Perl core, but the fundamentals will be
- right. ( http://gisle.aas.no/perl/illguts/ )
-
- =item L<perlxstut> and L<perlxs>
-
- A working knowledge of XSUB programming is incredibly useful for core
- hacking; XSUBs use techniques drawn from the PP code, the portion of the
- guts that actually executes a Perl program. It's a lot gentler to learn
- those techniques from simple examples and explanation than from the core
- itself.
-
- =item L<perlapi>
-
- The documentation for the Perl API explains what some of the internal
- functions do, as well as the many macros used in the source.
-
- =item F<Porting/pumpkin.pod>
-
- This is a collection of words of wisdom for a Perl porter; some of it is
- only useful to the pumpkin holder, but most of it applies to anyone
- wanting to go about Perl development.
-
- =item The perl5-porters FAQ
-
- This should be available from http://simon-cozens.org/writings/p5p-faq ;
- alternatively, you can get the FAQ emailed to you by sending mail to
- C<perl5-porters-faq@perl.org>. It contains hints on reading perl5-porters,
- information on how perl5-porters works and how Perl development in general
- works.
-
- =back
-
- =head2 Finding Your Way Around
-
- Perl maintenance can be split into a number of areas, and certain people
- (pumpkins) will have responsibility for each area. These areas sometimes
- correspond to files or directories in the source kit. Among the areas are:
-
- =over 3
-
- =item Core modules
-
- Modules shipped as part of the Perl core live in the F<lib/> and F<ext/>
- subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/>
- contains the core XS modules.
-
- =item Tests
-
- There are tests for nearly all the modules, built-ins and major bits
- of functionality. Test files all have a .t suffix. Module tests live
- in the F<lib/> and F<ext/> directories next to the module being
- tested. Others live in F<t/>. See L<Writing a test>
-
- =item Documentation
-
- Documentation maintenance includes looking after everything in the
- F<pod/> directory, (as well as contributing new documentation) and
- the documentation to the modules in core.
-
- =item Configure
-
- The configure process is the way we make Perl portable across the
- myriad of operating systems it supports. Responsibility for the
- configure, build and installation process, as well as the overall
- portability of the core code rests with the configure pumpkin - others
- help out with individual operating systems.
-
- The files involved are the operating system directories, (F<win32/>,
- F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h>
- and F<Makefile>, as well as the metaconfig files which generate
- F<Configure>. (metaconfig isn't included in the core distribution.)
-
- =item Interpreter
-
- And of course, there's the core of the Perl interpreter itself. Let's
- have a look at that in a little more detail.
-
- =back
-
- Before we leave looking at the layout, though, don't forget that
- F<MANIFEST> contains not only the file names in the Perl distribution,
- but short descriptions of what's in them, too. For an overview of the
- important files, try this:
-
- perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST
-
- =head2 Elements of the interpreter
-
- The work of the interpreter has two main stages: compiling the code
- into the internal representation, or bytecode, and then executing it.
- L<perlguts/Compiled code> explains exactly how the compilation stage
- happens.
-
- Here is a short breakdown of perl's operation:
-
- =over 3
-
- =item Startup
-
- The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl)
- This is very high-level code, enough to fit on a single screen, and it
- resembles the code found in L<perlembed>; most of the real action takes
- place in F<perl.c>
-
- First, F<perlmain.c> allocates some memory and constructs a Perl
- interpreter:
-
- 1 PERL_SYS_INIT3(&argc,&argv,&env);
- 2
- 3 if (!PL_do_undump) {
- 4 my_perl = perl_alloc();
- 5 if (!my_perl)
- 6 exit(1);
- 7 perl_construct(my_perl);
- 8 PL_perl_destruct_level = 0;
- 9 }
-
- Line 1 is a macro, and its definition is dependent on your operating
- system. Line 3 references C<PL_do_undump>, a global variable - all
- global variables in Perl start with C<PL_>. This tells you whether the
- current running program was created with the C<-u> flag to perl and then
- F<undump>, which means it's going to be false in any sane context.
-
- Line 4 calls a function in F<perl.c> to allocate memory for a Perl
- interpreter. It's quite a simple function, and the guts of it looks like
- this:
-
- my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
-
- Here you see an example of Perl's system abstraction, which we'll see
- later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's
- own C<malloc> as defined in F<malloc.c> if you selected that option at
- configure time.
-
- Next, in line 7, we construct the interpreter; this sets up all the
- special variables that Perl needs, the stacks, and so on.
-
- Now we pass Perl the command line options, and tell it to go:
-
- exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
- if (!exitstatus) {
- exitstatus = perl_run(my_perl);
- }
-
-
- C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined
- in F<perl.c>, which processes the command line options, sets up any
- statically linked XS modules, opens the program and calls C<yyparse> to
- parse it.
-
- =item Parsing
-
- The aim of this stage is to take the Perl source, and turn it into an op
- tree. We'll see what one of those looks like later. Strictly speaking,
- there's three things going on here.
-
- C<yyparse>, the parser, lives in F<perly.c>, although you're better off
- reading the original YACC input in F<perly.y>. (Yes, Virginia, there
- B<is> a YACC grammar for Perl!) The job of the parser is to take your
- code and `understand' it, splitting it into sentences, deciding which
- operands go with which operators and so on.
-
- The parser is nobly assisted by the lexer, which chunks up your input
- into tokens, and decides what type of thing each token is: a variable
- name, an operator, a bareword, a subroutine, a core function, and so on.
- The main point of entry to the lexer is C<yylex>, and that and its
- associated routines can be found in F<toke.c>. Perl isn't much like
- other computer languages; it's highly context sensitive at times, it can
- be tricky to work out what sort of token something is, or where a token
- ends. As such, there's a lot of interplay between the tokeniser and the
- parser, which can get pretty frightening if you're not used to it.
-
- As the parser understands a Perl program, it builds up a tree of
- operations for the interpreter to perform during execution. The routines
- which construct and link together the various operations are to be found
- in F<op.c>, and will be examined later.
-
- =item Optimization
-
- Now the parsing stage is complete, and the finished tree represents
- the operations that the Perl interpreter needs to perform to execute our
- program. Next, Perl does a dry run over the tree looking for
- optimisations: constant expressions such as C<3 + 4> will be computed
- now, and the optimizer will also see if any multiple operations can be
- replaced with a single one. For instance, to fetch the variable C<$foo>,
- instead of grabbing the glob C<*foo> and looking at the scalar
- component, the optimizer fiddles the op tree to use a function which
- directly looks up the scalar in question. The main optimizer is C<peep>
- in F<op.c>, and many ops have their own optimizing functions.
-
- =item Running
-
- Now we're finally ready to go: we have compiled Perl byte code, and all
- that's left to do is run it. The actual execution is done by the
- C<runops_standard> function in F<run.c>; more specifically, it's done by
- these three innocent looking lines:
-
- while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
- PERL_ASYNC_CHECK();
- }
-
- You may be more comfortable with the Perl version of that:
-
- PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
-
- Well, maybe not. Anyway, each op contains a function pointer, which
- stipulates the function which will actually carry out the operation.
- This function will return the next op in the sequence - this allows for
- things like C<if> which choose the next op dynamically at run time.
- The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt
- execution if required.
-
- The actual functions called are known as PP code, and they're spread
- between four files: F<pp_hot.c> contains the `hot' code, which is most
- often used and highly optimized, F<pp_sys.c> contains all the
- system-specific functions, F<pp_ctl.c> contains the functions which
- implement control structures (C<if>, C<while> and the like) and F<pp.c>
- contains everything else. These are, if you like, the C code for Perl's
- built-in functions and operators.
-
- =back
-
- =head2 Internal Variable Types
-
- You should by now have had a look at L<perlguts>, which tells you about
- Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do
- that now.
-
- These variables are used not only to represent Perl-space variables, but
- also any constants in the code, as well as some structures completely
- internal to Perl. The symbol table, for instance, is an ordinary Perl
- hash. Your code is represented by an SV as it's read into the parser;
- any program files you call are opened via ordinary Perl filehandles, and
- so on.
-
- The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a
- Perl program. Let's see, for instance, how Perl treats the constant
- C<"hello">.
-
- % perl -MDevel::Peek -e 'Dump("hello")'
- 1 SV = PV(0xa041450) at 0xa04ecbc
- 2 REFCNT = 1
- 3 FLAGS = (POK,READONLY,pPOK)
- 4 PV = 0xa0484e0 "hello"\0
- 5 CUR = 5
- 6 LEN = 6
-
- Reading C<Devel::Peek> output takes a bit of practise, so let's go
- through it line by line.
-
- Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in
- memory. SVs themselves are very simple structures, but they contain a
- pointer to a more complex structure. In this case, it's a PV, a
- structure which holds a string value, at location C<0xa041450>. Line 2
- is the reference count; there are no other references to this data, so
- it's 1.
-
- Line 3 are the flags for this SV - it's OK to use it as a PV, it's a
- read-only SV (because it's a constant) and the data is a PV internally.
- Next we've got the contents of the string, starting at location
- C<0xa0484e0>.
-
- Line 5 gives us the current length of the string - note that this does
- B<not> include the null terminator. Line 6 is not the length of the
- string, but the length of the currently allocated buffer; as the string
- grows, Perl automatically extends the available storage via a routine
- called C<SvGROW>.
-
- You can get at any of these quantities from C very easily; just add
- C<Sv> to the name of the field shown in the snippet, and you've got a
- macro which will return the value: C<SvCUR(sv)> returns the current
- length of the string, C<SvREFCOUNT(sv)> returns the reference count,
- C<SvPV(sv, len)> returns the string itself with its length, and so on.
- More macros to manipulate these properties can be found in L<perlguts>.
-
- Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c>
-
- 1 void
- 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len)
- 3 {
- 4 STRLEN tlen;
- 5 char *junk;
-
- 6 junk = SvPV_force(sv, tlen);
- 7 SvGROW(sv, tlen + len + 1);
- 8 if (ptr == junk)
- 9 ptr = SvPVX(sv);
- 10 Move(ptr,SvPVX(sv)+tlen,len,char);
- 11 SvCUR(sv) += len;
- 12 *SvEND(sv) = '\0';
- 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */
- 14 SvTAINT(sv);
- 15 }
-
- This is a function which adds a string, C<ptr>, of length C<len> onto
- the end of the PV stored in C<sv>. The first thing we do in line 6 is
- make sure that the SV B<has> a valid PV, by calling the C<SvPV_force>
- macro to force a PV. As a side effect, C<tlen> gets set to the current
- value of the PV, and the PV itself is returned to C<junk>.
-
- In line 7, we make sure that the SV will have enough room to accommodate
- the old string, the new string and the null terminator. If C<LEN> isn't
- big enough, C<SvGROW> will reallocate space for us.
-
- Now, if C<junk> is the same as the string we're trying to add, we can
- grab the string directly from the SV; C<SvPVX> is the address of the PV
- in the SV.
-
- Line 10 does the actual catenation: the C<Move> macro moves a chunk of
- memory around: we move the string C<ptr> to the end of the PV - that's
- the start of the PV plus its current length. We're moving C<len> bytes
- of type C<char>. After doing so, we need to tell Perl we've extended the
- string, by altering C<CUR> to reflect the new length. C<SvEND> is a
- macro which gives us the end of the string, so that needs to be a
- C<"\0">.
-
- Line 13 manipulates the flags; since we've changed the PV, any IV or NV
- values will no longer be valid: if we have C<$a=10; $a.="6";> we don't
- want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF8-aware
- version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags
- and turns on POK. The final C<SvTAINT> is a macro which launders tainted
- data if taint mode is turned on.
-
- AVs and HVs are more complicated, but SVs are by far the most common
- variable type being thrown around. Having seen something of how we
- manipulate these, let's go on and look at how the op tree is
- constructed.
-
- =head2 Op Trees
-
- First, what is the op tree, anyway? The op tree is the parsed
- representation of your program, as we saw in our section on parsing, and
- it's the sequence of operations that Perl goes through to execute your
- program, as we saw in L</Running>.
-
- An op is a fundamental operation that Perl can perform: all the built-in
- functions and operators are ops, and there are a series of ops which
- deal with concepts the interpreter needs internally - entering and
- leaving a block, ending a statement, fetching a variable, and so on.
-
- The op tree is connected in two ways: you can imagine that there are two
- "routes" through it, two orders in which you can traverse the tree.
- First, parse order reflects how the parser understood the code, and
- secondly, execution order tells perl what order to perform the
- operations in.
-
- The easiest way to examine the op tree is to stop Perl after it has
- finished parsing, and get it to dump out the tree. This is exactly what
- the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise>
- and L<B::Debug|B::Debug> do.
-
- Let's have a look at how Perl sees C<$a = $b + $c>:
-
- % perl -MO=Terse -e '$a=$b+$c'
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
- 4 BINOP (0x8179828) sassign
- 5 BINOP (0x8179800) add [1]
- 6 UNOP (0x81796e0) null [15]
- 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b
- 8 UNOP (0x81797e0) null [15]
- 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c
- 10 UNOP (0x816b4f0) null [15]
- 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a
-
- Let's start in the middle, at line 4. This is a BINOP, a binary
- operator, which is at location C<0x8179828>. The specific operator in
- question is C<sassign> - scalar assignment - and you can find the code
- which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a
- binary operator, it has two children: the add operator, providing the
- result of C<$b+$c>, is uppermost on line 5, and the left hand side is on
- line 10.
-
- Line 10 is the null op: this does exactly nothing. What is that doing
- there? If you see the null op, it's a sign that something has been
- optimized away after parsing. As we mentioned in L</Optimization>,
- the optimization stage sometimes converts two operations into one, for
- example when fetching a scalar variable. When this happens, instead of
- rewriting the op tree and cleaning up the dangling pointers, it's easier
- just to replace the redundant operation with the null op. Originally,
- the tree would have looked like this:
-
- 10 SVOP (0x816b4f0) rv2sv [15]
- 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a
-
- That is, fetch the C<a> entry from the main symbol table, and then look
- at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>)
- happens to do both these things.
-
- The right hand side, starting at line 5 is similar to what we've just
- seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together
- two C<gvsv>s.
-
- Now, what's this about?
-
- 1 LISTOP (0x8179888) leave
- 2 OP (0x81798b0) enter
- 3 COP (0x8179850) nextstate
-
- C<enter> and C<leave> are scoping ops, and their job is to perform any
- housekeeping every time you enter and leave a block: lexical variables
- are tidied up, unreferenced variables are destroyed, and so on. Every
- program will have those first three lines: C<leave> is a list, and its
- children are all the statements in the block. Statements are delimited
- by C<nextstate>, so a block is a collection of C<nextstate> ops, with
- the ops to be performed for each statement being the children of
- C<nextstate>. C<enter> is a single op which functions as a marker.
-
- That's how Perl parsed the program, from top to bottom:
-
- Program
- |
- Statement
- |
- =
- / \
- / \
- $a +
- / \
- $b $c
-
- However, it's impossible to B<perform> the operations in this order:
- you have to find the values of C<$b> and C<$c> before you add them
- together, for instance. So, the other thread that runs through the op
- tree is the execution order: each op has a field C<op_next> which points
- to the next op to be run, so following these pointers tells us how perl
- executes the code. We can traverse the tree in this order using
- the C<exec> option to C<B::Terse>:
-
- % perl -MO=Terse,exec -e '$a=$b+$c'
- 1 OP (0x8179928) enter
- 2 COP (0x81798c8) nextstate
- 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b
- 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c
- 5 BINOP (0x8179878) add [1]
- 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a
- 7 BINOP (0x81798a0) sassign
- 8 LISTOP (0x8179900) leave
-
- This probably makes more sense for a human: enter a block, start a
- statement. Get the values of C<$b> and C<$c>, and add them together.
- Find C<$a>, and assign one to the other. Then leave.
-
- The way Perl builds up these op trees in the parsing process can be
- unravelled by examining F<perly.y>, the YACC grammar. Let's take the
- piece we need to construct the tree for C<$a = $b + $c>
-
- 1 term : term ASSIGNOP term
- 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); }
- 3 | term ADDOP term
- 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
-
- If you're not used to reading BNF grammars, this is how it works: You're
- fed certain things by the tokeniser, which generally end up in upper
- case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your
- code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are
- `terminal symbols', because you can't get any simpler than them.
-
- The grammar, lines one and three of the snippet above, tells you how to
- build up more complex forms. These complex forms, `non-terminal symbols'
- are generally placed in lower case. C<term> here is a non-terminal
- symbol, representing a single expression.
-
- The grammar gives you the following rule: you can make the thing on the
- left of the colon if you see all the things on the right in sequence.
- This is called a "reduction", and the aim of parsing is to completely
- reduce the input. There are several different ways you can perform a
- reduction, separated by vertical bars: so, C<term> followed by C<=>
- followed by C<term> makes a C<term>, and C<term> followed by C<+>
- followed by C<term> can also make a C<term>.
-
- So, if you see two terms with an C<=> or C<+>, between them, you can
- turn them into a single expression. When you do this, you execute the
- code in the block on the next line: if you see C<=>, you'll do the code
- in line 2. If you see C<+>, you'll do the code in line 4. It's this code
- which contributes to the op tree.
-
- | term ADDOP term
- { $$ = newBINOP($2, 0, scalar($1), scalar($3)); }
-
- What this does is creates a new binary op, and feeds it a number of
- variables. The variables refer to the tokens: C<$1> is the first token in
- the input, C<$2> the second, and so on - think regular expression
- backreferences. C<$$> is the op returned from this reduction. So, we
- call C<newBINOP> to create a new binary operator. The first parameter to
- C<newBINOP>, a function in F<op.c>, is the op type. It's an addition
- operator, so we want the type to be C<ADDOP>. We could specify this
- directly, but it's right there as the second token in the input, so we
- use C<$2>. The second parameter is the op's flags: 0 means `nothing
- special'. Then the things to add: the left and right hand side of our
- expression, in scalar context.
-
- =head2 Stacks
-
- When perl executes something like C<addop>, how does it pass on its
- results to the next op? The answer is, through the use of stacks. Perl
- has a number of stacks to store things it's currently working on, and
- we'll look at the three most important ones here.
-
- =over 3
-
- =item Argument stack
-
- Arguments are passed to PP code and returned from PP code using the
- argument stack, C<ST>. The typical way to handle arguments is to pop
- them off the stack, deal with them how you wish, and then push the result
- back onto the stack. This is how, for instance, the cosine operator
- works:
-
- NV value;
- value = POPn;
- value = Perl_cos(value);
- XPUSHn(value);
-
- We'll see a more tricky example of this when we consider Perl's macros
- below. C<POPn> gives you the NV (floating point value) of the top SV on
- the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push
- the result back as an NV. The C<X> in C<XPUSHn> means that the stack
- should be extended if necessary - it can't be necessary here, because we
- know there's room for one more item on the stack, since we've just
- removed one! The C<XPUSH*> macros at least guarantee safety.
-
- Alternatively, you can fiddle with the stack directly: C<SP> gives you
- the first element in your portion of the stack, and C<TOP*> gives you
- the top SV/IV/NV/etc. on the stack. So, for instance, to do unary
- negation of an integer:
-
- SETi(-TOPi);
-
- Just set the integer value of the top stack entry to its negation.
-
- Argument stack manipulation in the core is exactly the same as it is in
- XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer
- description of the macros used in stack manipulation.
-
- =item Mark stack
-
- I say `your portion of the stack' above because PP code doesn't
- necessarily get the whole stack to itself: if your function calls
- another function, you'll only want to expose the arguments aimed for the
- called function, and not (necessarily) let it get at your own data. The
- way we do this is to have a `virtual' bottom-of-stack, exposed to each
- function. The mark stack keeps bookmarks to locations in the argument
- stack usable by each function. For instance, when dealing with a tied
- variable, (internally, something with `P' magic) Perl has to call
- methods for accesses to the tied variables. However, we need to separate
- the arguments exposed to the method to the argument exposed to the
- original function - the store or fetch or whatever it may be. Here's how
- the tied C<push> is implemented; see C<av_push> in F<av.c>:
-
- 1 PUSHMARK(SP);
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
- 5 PUTBACK;
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
- 9 POPSTACK;
-
- The lines which concern the mark stack are the first, fifth and last
- lines: they save away, restore and remove the current position of the
- argument stack.
-
- Let's examine the whole implementation, for practice:
-
- 1 PUSHMARK(SP);
-
- Push the current state of the stack pointer onto the mark stack. This is
- so that when we've finished adding items to the argument stack, Perl
- knows how many things we've added recently.
-
- 2 EXTEND(SP,2);
- 3 PUSHs(SvTIED_obj((SV*)av, mg));
- 4 PUSHs(val);
-
- We're going to add two more items onto the argument stack: when you have
- a tied array, the C<PUSH> subroutine receives the object and the value
- to be pushed, and that's exactly what we have here - the tied object,
- retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
-
- 5 PUTBACK;
-
- Next we tell Perl to make the change to the global stack pointer: C<dSP>
- only gave us a local copy, not a reference to the global.
-
- 6 ENTER;
- 7 call_method("PUSH", G_SCALAR|G_DISCARD);
- 8 LEAVE;
-
- C<ENTER> and C<LEAVE> localise a block of code - they make sure that all
- variables are tidied up, everything that has been localised gets
- its previous value returned, and so on. Think of them as the C<{> and
- C<}> of a Perl block.
-
- To actually do the magic method call, we have to call a subroutine in
- Perl space: C<call_method> takes care of that, and it's described in
- L<perlcall>. We call the C<PUSH> method in scalar context, and we're
- going to discard its return value.
-
- 9 POPSTACK;
-
- Finally, we remove the value we placed on the mark stack, since we
- don't need it any more.
-
- =item Save stack
-
- C doesn't have a concept of local scope, so perl provides one. We've
- seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save
- stack implements the C equivalent of, for example:
-
- {
- local $foo = 42;
- ...
- }
-
- See L<perlguts/Localising Changes> for how to use the save stack.
-
- =back
-
- =head2 Millions of Macros
-
- One thing you'll notice about the Perl source is that it's full of
- macros. Some have called the pervasive use of macros the hardest thing
- to understand, others find it adds to clarity. Let's take an example,
- the code which implements the addition operator:
-
- 1 PP(pp_add)
- 2 {
- 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
- 4 {
- 5 dPOPTOPnnrl_ul;
- 6 SETn( left + right );
- 7 RETURN;
- 8 }
- 9 }
-
- Every line here (apart from the braces, of course) contains a macro. The
- first line sets up the function declaration as Perl expects for PP code;
- line 3 sets up variable declarations for the argument stack and the
- target, the return value of the operation. Finally, it tries to see if
- the addition operation is overloaded; if so, the appropriate subroutine
- is called.
-
- Line 5 is another variable declaration - all variable declarations start
- with C<d> - which pops from the top of the argument stack two NVs (hence
- C<nn>) and puts them into the variables C<right> and C<left>, hence the
- C<rl>. These are the two operands to the addition operator. Next, we
- call C<SETn> to set the NV of the return value to the result of adding
- the two values. This done, we return - the C<RETURN> macro makes sure
- that our return value is properly handled, and we pass the next operator
- to run back to the main run loop.
-
- Most of these macros are explained in L<perlapi>, and some of the more
- important ones are explained in L<perlxs> as well. Pay special attention
- to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on
- the C<[pad]THX_?> macros.
-
- =head2 Poking at Perl
-
- To really poke around with Perl, you'll probably want to build Perl for
- debugging, like this:
-
- ./Configure -d -D optimize=-g
- make
-
- C<-g> is a flag to the C compiler to have it produce debugging
- information which will allow us to step through a running program.
- F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
- enables all the internal debugging code in Perl. There are a whole bunch
- of things you can debug with this: L<perlrun> lists them all, and the
- best way to find out about them is to play about with them. The most
- useful options are probably
-
- l Context (loop) stack processing
- t Trace execution
- o Method and overloading resolution
- c String/numeric conversions
-
- Some of the functionality of the debugging code can be achieved using XS
- modules.
-
- -Dr => use re 'debug'
- -Dx => use O 'Debug'
-
- =head2 Using a source-level debugger
-
- If the debugging output of C<-D> doesn't help you, it's time to step
- through perl's execution with a source-level debugger.
-
- =over 3
-
- =item *
-
- We'll use C<gdb> for our examples here; the principles will apply to any
- debugger, but check the manual of the one you're using.
-
- =back
-
- To fire up the debugger, type
-
- gdb ./perl
-
- You'll want to do that in your Perl source tree so the debugger can read
- the source code. You should see the copyright message, followed by the
- prompt.
-
- (gdb)
-
- C<help> will get you into the documentation, but here are the most
- useful commands:
-
- =over 3
-
- =item run [args]
-
- Run the program with the given arguments.
-
- =item break function_name
-
- =item break source.c:xxx
-
- Tells the debugger that we'll want to pause execution when we reach
- either the named function (but see L<perlguts/Internal Functions>!) or the given
- line in the named source file.
-
- =item step
-
- Steps through the program a line at a time.
-
- =item next
-
- Steps through the program a line at a time, without descending into
- functions.
-
- =item continue
-
- Run until the next breakpoint.
-
- =item finish
-
- Run until the end of the current function, then stop again.
-
- =item 'enter'
-
- Just pressing Enter will do the most recent operation again - it's a
- blessing when stepping through miles of source code.
-
- =item print
-
- Execute the given C code and print its results. B<WARNING>: Perl makes
- heavy use of macros, and F<gdb> is not aware of macros. You'll have to
- substitute them yourself. So, for instance, you can't say
-
- print SvPV_nolen(sv)
-
- but you have to say
-
- print Perl_sv_2pv_nolen(sv)
-
- You may find it helpful to have a "macro dictionary", which you can
- produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't
- recursively apply the macros for you.
-
- =back
-
- =head2 Dumping Perl Data Structures
-
- One way to get around this macro hell is to use the dumping functions in
- F<dump.c>; these work a little like an internal
- L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures
- that you can't get at from Perl. Let's take an example. We'll use the
- C<$a = $b + $c> we used before, but give it a bit of context:
- C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around?
-
- What about C<pp_add>, the function we examined earlier to implement the
- C<+> operator:
-
- (gdb) break Perl_pp_add
- Breakpoint 1 at 0x46249f: file pp_hot.c, line 309.
-
- Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>.
- With the breakpoint in place, we can run our program:
-
- (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c'
-
- Lots of junk will go past as gdb reads in the relevant source files and
- libraries, and then:
-
- Breakpoint 1, Perl_pp_add () at pp_hot.c:309
- 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN);
- (gdb) step
- 311 dPOPTOPnnrl_ul;
- (gdb)
-
- We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul>
- arranges for two C<NV>s to be placed into C<left> and C<right> - let's
- slightly expand it:
-
- #define dPOPTOPnnrl_ul NV right = POPn; \
- SV *leftsv = TOPs; \
- NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0
-
- C<POPn> takes the SV from the top of the stack and obtains its NV either
- directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function.
- C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses
- C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from
- C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>.
-
- Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to
- convert it. If we step again, we'll find ourselves there:
-
- Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669
- 1669 if (!sv)
- (gdb)
-
- We can now use C<Perl_sv_dump> to investigate the SV:
-
- SV = PV(0xa057cc0) at 0xa0675d0
- REFCNT = 1
- FLAGS = (POK,pPOK)
- PV = 0xa06a510 "6XXXX"\0
- CUR = 5
- LEN = 6
- $1 = void
-
- We know we're going to get C<6> from this, so let's finish the
- subroutine:
-
- (gdb) finish
- Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671
- 0x462669 in Perl_pp_add () at pp_hot.c:311
- 311 dPOPTOPnnrl_ul;
-
- We can also dump out this op: the current op is always stored in
- C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us
- similar output to L<B::Debug|B::Debug>.
-
- {
- 13 TYPE = add ===> 14
- TARG = 1
- FLAGS = (SCALAR,KIDS)
- {
- TYPE = null ===> (12)
- (was rv2sv)
- FLAGS = (SCALAR,KIDS)
- {
- 11 TYPE = gvsv ===> 12
- FLAGS = (SCALAR)
- GV = main::b
- }
- }
-
- # finish this later #
-
- =head2 Patching
-
- All right, we've now had a look at how to navigate the Perl sources and
- some things you'll need to know when fiddling with them. Let's now get
- on and create a simple patch. Here's something Larry suggested: if a
- C<U> is the first active format during a C<pack>, (for example,
- C<pack "U3C8", @stuff>) then the resulting string should be treated as
- UTF8 encoded.
-
- How do we prepare to fix this up? First we locate the code in question -
- the C<pack> happens at runtime, so it's going to be in one of the F<pp>
- files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be
- altering this file, let's copy it to F<pp.c~>.
-
- [Well, it was in F<pp.c> when this tutorial was written. It has now been
- split off with C<pp_unpack> to its own file, F<pp_pack.c>]
-
- Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then
- loop over the pattern, taking each format character in turn into
- C<datum_type>. Then for each possible format character, we swallow up
- the other arguments in the pattern (a field width, an asterisk, and so
- on) and convert the next chunk input into the specified format, adding
- it onto the output SV C<cat>.
-
- How do we know if the C<U> is the first format in the C<pat>? Well, if
- we have a pointer to the start of C<pat> then, if we see a C<U> we can
- test whether we're still at the start of the string. So, here's where
- C<pat> is set up:
-
- STRLEN fromlen;
- register char *pat = SvPVx(*++MARK, fromlen);
- register char *patend = pat + fromlen;
- register I32 len;
- I32 datumtype;
- SV *fromstr;
-
- We'll have another string pointer in there:
-
- STRLEN fromlen;
- register char *pat = SvPVx(*++MARK, fromlen);
- register char *patend = pat + fromlen;
- + char *patcopy;
- register I32 len;
- I32 datumtype;
- SV *fromstr;
-
- And just before we start the loop, we'll set C<patcopy> to be the start
- of C<pat>:
-
- items = SP - MARK;
- MARK++;
- sv_setpvn(cat, "", 0);
- + patcopy = pat;
- while (pat < patend) {
-
- Now if we see a C<U> which was at the start of the string, we turn on
- the UTF8 flag for the output SV, C<cat>:
-
- + if (datumtype == 'U' && pat==patcopy+1)
- + SvUTF8_on(cat);
- if (datumtype == '#') {
- while (pat < patend && *pat != '\n')
- pat++;
-
- Remember that it has to be C<patcopy+1> because the first character of
- the string is the C<U> which has been swallowed into C<datumtype!>
-
- Oops, we forgot one thing: what if there are spaces at the start of the
- pattern? C<pack(" U*", @stuff)> will have C<U> as the first active
- character, even though it's not the first thing in the pattern. In this
- case, we have to advance C<patcopy> along with C<pat> when we see spaces:
-
- if (isSPACE(datumtype))
- continue;
-
- needs to become
-
- if (isSPACE(datumtype)) {
- patcopy++;
- continue;
- }
-
- OK. That's the C part done. Now we must do two additional things before
- this patch is ready to go: we've changed the behaviour of Perl, and so
- we must document that change. We must also provide some more regression
- tests to make sure our patch works and doesn't create a bug somewhere
- else along the line.
-
- The regression tests for each operator live in F<t/op/>, and so we
- make a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our
- tests to the end. First, we'll test that the C<U> does indeed create
- Unicode strings.
-
- t/op/pack.t has a sensible ok() function, but if it didn't we could
- use the one from t/test.pl.
-
- require './test.pl';
- plan( tests => 159 );
-
- so instead of this:
-
- print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000);
- print "ok $test\n"; $test++;
-
- we can write the more sensible (see L<Test::More> for a full
- explanation of is() and other testing functions).
-
- is( "1.20.300.4000", sprintf "%vd", pack("U*",1,20,300,4000),
- "U* produces unicode" );
-
- Now we'll test that we got that space-at-the-beginning business right:
-
- is( "1.20.300.4000", sprintf "%vd", pack(" U*",1,20,300,4000),
- " with spaces at the beginning" );
-
- And finally we'll test that we don't make Unicode strings if C<U> is B<not>
- the first active format:
-
- isnt( v1.20.300.4000, sprintf "%vd", pack("C0U*",1,20,300,4000),
- "U* not first isn't unicode" );
-
- Mustn't forget to change the number of tests which appears at the top,
- or else the automated tester will get confused. This will either look
- like this:
-
- print "1..156\n";
-
- or this:
-
- plan( tests => 156 );
-
- We now compile up Perl, and run it through the test suite. Our new
- tests pass, hooray!
-
- Finally, the documentation. The job is never done until the paperwork is
- over, so let's describe the change we've just made. The relevant place
- is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert
- this text in the description of C<pack>:
-
- =item *
-
- If the pattern begins with a C<U>, the resulting string will be treated
- as Unicode-encoded. You can force UTF8 encoding on in a string with an
- initial C<U0>, and the bytes that follow will be interpreted as Unicode
- characters. If you don't want this to happen, you can begin your pattern
- with C<C0> (or anything else) to force Perl not to UTF8 encode your
- string, and then follow this with a C<U*> somewhere in your pattern.
-
- All done. Now let's create the patch. F<Porting/patching.pod> tells us
- that if we're making major changes, we should copy the entire directory
- to somewhere safe before we begin fiddling, and then do
-
- diff -ruN old new > patch
-
- However, we know which files we've changed, and we can simply do this:
-
- diff -u pp.c~ pp.c > patch
- diff -u t/op/pack.t~ t/op/pack.t >> patch
- diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch
-
- We end up with a patch looking a little like this:
-
- --- pp.c~ Fri Jun 02 04:34:10 2000
- +++ pp.c Fri Jun 16 11:37:25 2000
- @@ -4375,6 +4375,7 @@
- register I32 items;
- STRLEN fromlen;
- register char *pat = SvPVx(*++MARK, fromlen);
- + char *patcopy;
- register char *patend = pat + fromlen;
- register I32 len;
- I32 datumtype;
- @@ -4405,6 +4406,7 @@
- ...
-
- And finally, we submit it, with our rationale, to perl5-porters. Job
- done!
-
- =head2 Patching a core module
-
- This works just like patching anything else, with an extra
- consideration. Many core modules also live on CPAN. If this is so,
- patch the CPAN version instead of the core and send the patch off to
- the module maintainer (with a copy to p5p). This will help the module
- maintainer keep the CPAN version in sync with the core version without
- constantly scanning p5p.
-
- =head2 Adding a new function to the core
-
- If, as part of a patch to fix a bug, or just because you have an
- especially good idea, you decide to add a new function to the core,
- discuss your ideas on p5p well before you start work. It may be that
- someone else has already attempted to do what you are considering and
- can give lots of good advice or even provide you with bits of code
- that they already started (but never finished).
-
- You have to follow all of the advice given above for patching. It is
- extremely important to test any addition thoroughly and add new tests
- to explore all boundary conditions that your new function is expected
- to handle. If your new function is used only by one module (e.g. toke),
- then it should probably be named S_your_function (for static); on the
- other hand, if you expect it to accessible from other functions in
- Perl, you should name it Perl_your_function. See L<perlguts/Internal Functions>
- for more details.
-
- The location of any new code is also an important consideration. Don't
- just create a new top level .c file and put your code there; you would
- have to make changes to Configure (so the Makefile is created properly),
- as well as possibly lots of include files. This is strictly pumpking
- business.
-
- It is better to add your function to one of the existing top level
- source code files, but your choice is complicated by the nature of
- the Perl distribution. Only the files that are marked as compiled
- static are located in the perl executable. Everything else is located
- in the shared library (or DLL if you are running under WIN32). So,
- for example, if a function was only used by functions located in
- toke.c, then your code can go in toke.c. If, however, you want to call
- the function from universal.c, then you should put your code in another
- location, for example util.c.
-
- In addition to writing your c-code, you will need to create an
- appropriate entry in embed.pl describing your function, then run
- 'make regen_headers' to create the entries in the numerous header
- files that perl needs to compile correctly. See L<perlguts/Internal Functions>
- for information on the various options that you can set in embed.pl.
- You will forget to do this a few (or many) times and you will get
- warnings during the compilation phase. Make sure that you mention
- this when you post your patch to P5P; the pumpking needs to know this.
-
- When you write your new code, please be conscious of existing code
- conventions used in the perl source files. See L<perlstyle> for
- details. Although most of the guidelines discussed seem to focus on
- Perl code, rather than c, they all apply (except when they don't ;).
- See also I<Porting/patching.pod> file in the Perl source distribution
- for lots of details about both formatting and submitting patches of
- your changes.
-
- Lastly, TEST TEST TEST TEST TEST any code before posting to p5p.
- Test on as many platforms as you can find. Test as many perl
- Configure options as you can (e.g. MULTIPLICITY). If you have
- profiling or memory tools, see L<EXTERNAL TOOLS FOR DEBUGGING PERL>
- below for how to use them to further test your code. Remember that
- most of the people on P5P are doing this on their own time and
- don't have the time to debug your code.
-
- =head2 Writing a test
-
- Every module and built-in function has an associated test file (or
- should...). If you add or change functionality, you have to write a
- test. If you fix a bug, you have to write a test so that bug never
- comes back. If you alter the docs, it would be nice to test what the
- new documentation says.
-
- In short, if you submit a patch you probably also have to patch the
- tests.
-
- For modules, the test file is right next to the module itself.
- F<lib/strict.t> tests F<lib/strict.pm>. This is a recent innovation,
- so there are some snags (and it would be wonderful for you to brush
- them out), but it basically works that way. Everything else lives in
- F<t/>.
-
- =over 3
-
- =item F<t/base/>
-
- Testing of the absolute basic functionality of Perl. Things like
- C<if>, basic file reads and writes, simple regexes, etc. These are
- run first in the test suite and if any of them fail, something is
- I<really> broken.
-
- =item F<t/cmd/>
-
- These test the basic control structures, C<if/else>, C<while>,
- subroutines, etc.
-
- =item F<t/comp/>
-
- Tests basic issues of how Perl parses and compiles itself.
-
- =item F<t/io/>
-
- Tests for built-in IO functions, including command line arguments.
-
- =item F<t/lib/>
-
- The old home for the module tests, you shouldn't put anything new in
- here. There are still some bits and pieces hanging around in here
- that need to be moved. Perhaps you could move them? Thanks!
-
- =item F<t/op/>
-
- Tests for perl's built in functions that don't fit into any of the
- other directories.
-
- =item F<t/pod/>
-
- Tests for POD directives. There are still some tests for the Pod
- modules hanging around in here that need to be moved out into F<lib/>.
-
- =item F<t/run/>
-
- Testing features of how perl actually runs, including exit codes and
- handling of PERL* environment variables.
-
- =back
-
- The core uses the same testing style as the rest of Perl, a simple
- "ok/not ok" run through Test::Harness, but there are a few special
- considerations.
-
- There are three ways to write a test in the core. Test::More,
- t/test.pl and ad hoc C<print $test ? "ok 42\n" : "not ok 42\n">. The
- decision of which to use depends on what part of the test suite you're
- working on. This is a measure to prevent a high-level failure (such
- as Config.pm breaking) from causing basic functionality tests to fail.
-
- =over 4
-
- =item t/base t/comp
-
- Since we don't know if require works, or even subroutines, use ad hoc
- tests for these two. Step carefully to avoid using the feature being
- tested.
-
- =item t/cmd t/run t/io t/op
-
- Now that basic require() and subroutines are tested, you can use the
- t/test.pl library which emulates the important features of Test::More
- while using a minimum of core features.
-
- You can also conditionally use certain libraries like Config, but be
- sure to skip the test gracefully if it's not there.
-
- =item t/lib ext lib
-
- Now that the core of Perl is tested, Test::More can be used. You can
- also use the full suite of core modules in the tests.
-
- =back
-
- When you say "make test" Perl uses the F<t/TEST> program to run the
- test suite. All tests are run from the F<t/> directory, B<not> the
- directory which contains the test. This causes some problems with the
- tests in F<lib/>, so here's some opportunity for some patching.
-
- You must be triply conscious of cross-platform concerns. This usually
- boils down to using File::Spec and avoiding things like C<fork()> and
- C<system()> unless absolutely necessary.
-
- =head2 Special Make Test Targets
-
- There are various special make targets that can be used to test Perl
- slightly differently than the standard "test" target. Not all them
- are expected to give a 100% success rate. Many of them have several
- aliases.
-
- =over 4
-
- =item coretest
-
- Run F<perl> on all core tests (F<t/*> and F<lib/[a-z]*> pragma tests).
-
- =item test.deparse
-
- Run all the tests through the B::Deparse. Not all tests will succeed.
-
- =item minitest
-
- Run F<miniperl> on F<t/base>, F<t/comp>, F<t/cmd>, F<t/run>, F<t/io>,
- F<t/op>, and F<t/uni> tests.
-
- =item test.third check.third utest.third ucheck.third
-
- (Only in Tru64) Run all the tests using the memory leak + naughty
- memory access tool "Third Degree". The log files will be named
- F<perl3.log.testname>.
-
- =item test.torture torturetest
-
- Run all the usual tests and some extra tests. As of Perl 5.8.0 the
- only extra tests are Abigail's JAPHs, t/japh/abigail.t.
-
- You can also run the torture test with F<t/harness> by giving
- C<-torture> argument to F<t/harness>.
-
- =item utest ucheck test.utf8 check.utf8
-
- Run all the tests with -Mutf8. Not all tests will succeed.
-
- =back
-
- =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
-
- Sometimes it helps to use external tools while debugging and
- testing Perl. This section tries to guide you through using
- some common testing and debugging tools with Perl. This is
- meant as a guide to interfacing these tools with Perl, not
- as any kind of guide to the use of the tools themselves.
-
- =head2 Rational Software's Purify
-
- Purify is a commercial tool that is helpful in identifying
- memory overruns, wild pointers, memory leaks and other such
- badness. Perl must be compiled in a specific way for
- optimal testing with Purify. Purify is available under
- Windows NT, Solaris, HP-UX, SGI, and Siemens Unix.
-
- The only currently known leaks happen when there are
- compile-time errors within eval or require. (Fixing these
- is non-trivial, unfortunately, but they must be fixed
- eventually.)
-
- =head2 Purify on Unix
-
- On Unix, Purify creates a new Perl binary. To get the most
- benefit out of Purify, you should create the perl to Purify
- using:
-
- sh Configure -Accflags=-DPURIFY -Doptimize='-g' \
- -Uusemymalloc -Dusemultiplicity
-
- where these arguments mean:
-
- =over 4
-
- =item -Accflags=-DPURIFY
-
- Disables Perl's arena memory allocation functions, as well as
- forcing use of memory allocation functions derived from the
- system malloc.
-
- =item -Doptimize='-g'
-
- Adds debugging information so that you see the exact source
- statements where the problem occurs. Without this flag, all
- you will see is the source filename of where the error occurred.
-
- =item -Uusemymalloc
-
- Disable Perl's malloc so that Purify can more closely monitor
- allocations and leaks. Using Perl's malloc will make Purify
- report most leaks in the "potential" leaks category.
-
- =item -Dusemultiplicity
-
- Enabling the multiplicity option allows perl to clean up
- thoroughly when the interpreter shuts down, which reduces the
- number of bogus leak reports from Purify.
-
- =back
-
- Once you've compiled a perl suitable for Purify'ing, then you
- can just:
-
- make pureperl
-
- which creates a binary named 'pureperl' that has been Purify'ed.
- This binary is used in place of the standard 'perl' binary
- when you want to debug Perl memory problems.
-
- To minimize the number of memory leak false alarms
- (see L</PERL_DESTRUCT_LEVEL>), set environment variable
- PERL_DESTRUCT_LEVEL to 2.
-
- setenv PERL_DESTRUCT_LEVEL 2
-
- In Bourne-type shells:
-
- PERL_DESTRUCT_LEVEL=2
- export PERL_DESTRUCT_LEVEL
-
- As an example, to show any memory leaks produced during the
- standard Perl testset you would create and run the Purify'ed
- perl as:
-
- make pureperl
- cd t
- ../pureperl -I../lib harness
-
- which would run Perl on test.pl and report any memory problems.
-
- Purify outputs messages in "Viewer" windows by default. If
- you don't have a windowing environment or if you simply
- want the Purify output to unobtrusively go to a log file
- instead of to the interactive window, use these following
- options to output to the log file "perl.log":
-
- setenv PURIFYOPTIONS "-chain-length=25 -windows=no \
- -log-file=perl.log -append-logfile=yes"
-
- If you plan to use the "Viewer" windows, then you only need this option:
-
- setenv PURIFYOPTIONS "-chain-length=25"
-
- In Bourne-type shells:
-
- PURIFYOPTIONS="..."
- export PURIFYOPTIONS
-
- or if you have the "env" utility:
-
- env PURIFYOPTIONS="..." ../pureperl ...
-
- =head2 Purify on NT
-
- Purify on Windows NT instruments the Perl binary 'perl.exe'
- on the fly. There are several options in the makefile you
- should change to get the most use out of Purify:
-
- =over 4
-
- =item DEFINES
-
- You should add -DPURIFY to the DEFINES line so the DEFINES
- line looks something like:
-
- DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1
-
- to disable Perl's arena memory allocation functions, as
- well as to force use of memory allocation functions derived
- from the system malloc.
-
- =item USE_MULTI = define
-
- Enabling the multiplicity option allows perl to clean up
- thoroughly when the interpreter shuts down, which reduces the
- number of bogus leak reports from Purify.
-
- =item #PERL_MALLOC = define
-
- Disable Perl's malloc so that Purify can more closely monitor
- allocations and leaks. Using Perl's malloc will make Purify
- report most leaks in the "potential" leaks category.
-
- =item CFG = Debug
-
- Adds debugging information so that you see the exact source
- statements where the problem occurs. Without this flag, all
- you will see is the source filename of where the error occurred.
-
- =back
-
- As an example, to show any memory leaks produced during the
- standard Perl testset you would create and run Purify as:
-
- cd win32
- make
- cd ../t
- purify ../perl -I../lib harness
-
- which would instrument Perl in memory, run Perl on test.pl,
- then finally report any memory problems.
-
- B<NOTE>: as of Perl 5.8.0, the ext/Encode/t/Unicode.t takes
- extraordinarily long (hours?) to complete under Purify. It has been
- theorized that it would eventually finish, but nobody has so far been
- patient enough :-) (This same extreme slowdown has been seen also with
- the Third Degree tool, so the said test must be doing something that
- is quite unfriendly for memory debuggers.) It is suggested that you
- simply kill away that testing process.
-
- =head2 Compaq's/Digital's/HP's Third Degree
-
- Third Degree is a tool for memory leak detection and memory access checks.
- It is one of the many tools in the ATOM toolkit. The toolkit is only
- available on Tru64 (formerly known as Digital UNIX formerly known as
- DEC OSF/1).
-
- When building Perl, you must first run Configure with -Doptimize=-g
- and -Uusemymalloc flags, after that you can use the make targets
- "perl.third" and "test.third". (What is required is that Perl must be
- compiled using the C<-g> flag, you may need to re-Configure.)
-
- The short story is that with "atom" you can instrument the Perl
- executable to create a new executable called F<perl.third>. When the
- instrumented executable is run, it creates a log of dubious memory
- traffic in file called F<perl.3log>. See the manual pages of atom and
- third for more information. The most extensive Third Degree
- documentation is available in the Compaq "Tru64 UNIX Programmer's
- Guide", chapter "Debugging Programs with Third Degree".
-
- The "test.third" leaves a lot of files named F<foo_bar.3log> in the t/
- subdirectory. There is a problem with these files: Third Degree is so
- effective that it finds problems also in the system libraries.
- Therefore you should used the Porting/thirdclean script to cleanup
- the F<*.3log> files.
-
- There are also leaks that for given certain definition of a leak,
- aren't. See L</PERL_DESTRUCT_LEVEL> for more information.
-
- =head2 PERL_DESTRUCT_LEVEL
-
- If you want to run any of the tests yourself manually using the
- pureperl or perl.third executables, please note that by default
- perl B<does not> explicitly cleanup all the memory it has allocated
- (such as global memory arenas) but instead lets the exit() of
- the whole program "take care" of such allocations, also known
- as "global destruction of objects".
-
- There is a way to tell perl to do complete cleanup: set the
- environment variable PERL_DESTRUCT_LEVEL to a non-zero value.
- The t/TEST wrapper does set this to 2, and this is what you
- need to do too, if you don't want to see the "global leaks":
- For example, for "third-degreed" Perl:
-
- env PERL_DESTRUCT_LEVEL=2 ./perl.third -Ilib t/foo/bar.t
-
- (Note: the mod_perl apache module uses also this environment variable
- for its own purposes and extended its semantics. Refer to the mod_perl
- documentation for more information.)
-
- =head2 Profiling
-
- Depending on your platform there are various of profiling Perl.
-
- There are two commonly used techniques of profiling executables:
- I<statistical time-sampling> and I<basic-block counting>.
-
- The first method takes periodically samples of the CPU program
- counter, and since the program counter can be correlated with the code
- generated for functions, we get a statistical view of in which
- functions the program is spending its time. The caveats are that very
- small/fast functions have lower probability of showing up in the
- profile, and that periodically interrupting the program (this is
- usually done rather frequently, in the scale of milliseconds) imposes
- an additional overhead that may skew the results. The first problem
- can be alleviated by running the code for longer (in general this is a
- good idea for profiling), the second problem is usually kept in guard
- by the profiling tools themselves.
-
- The second method divides up the generated code into I<basic blocks>.
- Basic blocks are sections of code that are entered only in the
- beginning and exited only at the end. For example, a conditional jump
- starts a basic block. Basic block profiling usually works by
- I<instrumenting> the code by adding I<enter basic block #nnnn>
- book-keeping code to the generated code. During the execution of the
- code the basic block counters are then updated appropriately. The
- caveat is that the added extra code can skew the results: again, the
- profiling tools usually try to factor their own effects out of the
- results.
-
- =head2 Gprof Profiling
-
- gprof is a profiling tool available in many UNIX platforms,
- it uses F<statistical time-sampling>.
-
- You can build a profiled version of perl called "perl.gprof" by
- invoking the make target "perl.gprof" (What is required is that Perl
- must be compiled using the C<-pg> flag, you may need to re-Configure).
- Running the profiled version of Perl will create an output file called
- F<gmon.out> is created which contains the profiling data collected
- during the execution.
-
- The gprof tool can then display the collected data in various ways.
- Usually gprof understands the following options:
-
- =over 4
-
- =item -a
-
- Suppress statically defined functions from the profile.
-
- =item -b
-
- Suppress the verbose descriptions in the profile.
-
- =item -e routine
-
- Exclude the given routine and its descendants from the profile.
-
- =item -f routine
-
- Display only the given routine and its descendants in the profile.
-
- =item -s
-
- Generate a summary file called F<gmon.sum> which then may be given
- to subsequent gprof runs to accumulate data over several runs.
-
- =item -z
-
- Display routines that have zero usage.
-
- =back
-
- For more detailed explanation of the available commands and output
- formats, see your own local documentation of gprof.
-
- =head2 GCC gcov Profiling
-
- Starting from GCC 3.0 I<basic block profiling> is officially available
- for the GNU CC.
-
- You can build a profiled version of perl called F<perl.gcov> by
- invoking the make target "perl.gcov" (what is required that Perl must
- be compiled using gcc with the flags C<-fprofile-arcs
- -ftest-coverage>, you may need to re-Configure).
-
- Running the profiled version of Perl will cause profile output to be
- generated. For each source file an accompanying ".da" file will be
- created.
-
- To display the results you use the "gcov" utility (which should
- be installed if you have gcc 3.0 or newer installed). F<gcov> is
- run on source code files, like this
-
- gcov sv.c
-
- which will cause F<sv.c.gcov> to be created. The F<.gcov> files
- contain the source code annotated with relative frequencies of
- execution indicated by "#" markers.
-
- Useful options of F<gcov> include C<-b> which will summarise the
- basic block, branch, and function call coverage, and C<-c> which
- instead of relative frequencies will use the actual counts. For
- more information on the use of F<gcov> and basic block profiling
- with gcc, see the latest GNU CC manual, as of GCC 3.0 see
-
- http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc.html
-
- and its section titled "8. gcov: a Test Coverage Program"
-
- http://gcc.gnu.org/onlinedocs/gcc-3.0/gcc_8.html#SEC132
-
- =head2 Pixie Profiling
-
- Pixie is a profiling tool available on IRIX and Tru64 (aka Digital
- UNIX aka DEC OSF/1) platforms. Pixie does its profiling using
- I<basic-block counting>.
-
- You can build a profiled version of perl called F<perl.pixie> by
- invoking the make target "perl.pixie" (what is required is that Perl
- must be compiled using the C<-g> flag, you may need to re-Configure).
-
- In Tru64 a file called F<perl.Addrs> will also be silently created,
- this file contains the addresses of the basic blocks. Running the
- profiled version of Perl will create a new file called "perl.Counts"
- which contains the counts for the basic block for that particular
- program execution.
-
- To display the results you use the F<prof> utility. The exact
- incantation depends on your operating system, "prof perl.Counts" in
- IRIX, and "prof -pixie -all -L. perl" in Tru64.
-
- In IRIX the following prof options are available:
-
- =over 4
-
- =item -h
-
- Reports the most heavily used lines in descending order of use.
- Useful for finding the hotspot lines.
-
- =item -l
-
- Groups lines by procedure, with procedures sorted in descending order of use.
- Within a procedure, lines are listed in source order.
- Useful for finding the hotspots of procedures.
-
- =back
-
- In Tru64 the following options are available:
-
- =over 4
-
- =item -p[rocedures]
-
- Procedures sorted in descending order by the number of cycles executed
- in each procedure. Useful for finding the hotspot procedures.
- (This is the default option.)
-
- =item -h[eavy]
-
- Lines sorted in descending order by the number of cycles executed in
- each line. Useful for finding the hotspot lines.
-
- =item -i[nvocations]
-
- The called procedures are sorted in descending order by number of calls
- made to the procedures. Useful for finding the most used procedures.
-
- =item -l[ines]
-
- Grouped by procedure, sorted by cycles executed per procedure.
- Useful for finding the hotspots of procedures.
-
- =item -testcoverage
-
- The compiler emitted code for these lines, but the code was unexecuted.
-
- =item -z[ero]
-
- Unexecuted procedures.
-
- =back
-
- For further information, see your system's manual pages for pixie and prof.
-
- =head2 Miscellaneous tricks
-
- =over 4
-
- =item *
-
- Those debugging perl with the DDD frontend over gdb may find the
- following useful:
-
- You can extend the data conversion shortcuts menu, so for example you
- can display an SV's IV value with one click, without doing any typing.
- To do that simply edit ~/.ddd/init file and add after:
-
- ! Display shortcuts.
- Ddd*gdbDisplayShortcuts: \
- /t () // Convert to Bin\n\
- /d () // Convert to Dec\n\
- /x () // Convert to Hex\n\
- /o () // Convert to Oct(\n\
-
- the following two lines:
-
- ((XPV*) (())->sv_any )->xpv_pv // 2pvx\n\
- ((XPVIV*) (())->sv_any )->xiv_iv // 2ivx
-
- so now you can do ivx and pvx lookups or you can plug there the
- sv_peek "conversion":
-
- Perl_sv_peek(my_perl, (SV*)()) // sv_peek
-
- (The my_perl is for threaded builds.)
- Just remember that every line, but the last one, should end with \n\
-
- Alternatively edit the init file interactively via:
- 3rd mouse button -> New Display -> Edit Menu
-
- Note: you can define up to 20 conversion shortcuts in the gdb
- section.
-
- =item *
-
- If you see in a debugger a memory area mysteriously full of 0xabababab,
- you may be seeing the effect of the Poison() macro, see L<perlclib>.
-
- =back
-
- =head2 CONCLUSION
-
- We've had a brief look around the Perl source, an overview of the stages
- F<perl> goes through when it's running your code, and how to use a
- debugger to poke at the Perl guts. We took a very simple problem and
- demonstrated how to solve it fully - with documentation, regression
- tests, and finally a patch for submission to p5p. Finally, we talked
- about how to use external tools to debug and test Perl.
-
- I'd now suggest you read over those references again, and then, as soon
- as possible, get your hands dirty. The best way to learn is by doing,
- so:
-
- =over 3
-
- =item *
-
- Subscribe to perl5-porters, follow the patches and try and understand
- them; don't be afraid to ask if there's a portion you're not clear on -
- who knows, you may unearth a bug in the patch...
-
- =item *
-
- Keep up to date with the bleeding edge Perl distributions and get
- familiar with the changes. Try and get an idea of what areas people are
- working on and the changes they're making.
-
- =item *
-
- Do read the README associated with your operating system, e.g. README.aix
- on the IBM AIX OS. Don't hesitate to supply patches to that README if
- you find anything missing or changed over a new OS release.
-
- =item *
-
- Find an area of Perl that seems interesting to you, and see if you can
- work out how it works. Scan through the source, and step over it in the
- debugger. Play, poke, investigate, fiddle! You'll probably get to
- understand not just your chosen area but a much wider range of F<perl>'s
- activity as well, and probably sooner than you'd think.
-
- =back
-
- =over 3
-
- =item I<The Road goes ever on and on, down from the door where it began.>
-
- =back
-
- If you can do these things, you've started on the long road to Perl porting.
- Thanks for wanting to help make Perl better - and happy hacking!
-
- =head1 AUTHOR
-
- This document was written by Nathan Torkington, and is maintained by
- the perl5-porters mailing list.
-
-