home *** CD-ROM | disk | FTP | other *** search
- Path: bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
- From: spp@vx.com
- Newsgroups: comp.lang.perl,comp.answers,news.answers
- Subject: comp.lang.perl FAQ 4/5 - General Programming
- Supersedes: <perl-faq/part4_784894001@rtfm.mit.edu>
- Followup-To: poster
- Date: 30 Nov 1994 09:40:53 GMT
- Organization: none
- Lines: 1065
- Approved: news-answers-request@MIT.EDU
- Distribution: world
- Message-ID: <perl-faq/part4_786188328@rtfm.mit.edu>
- References: <perl-faq/part0_786188328@rtfm.mit.edu>
- NNTP-Posting-Host: bloom-picayune.mit.edu
- X-Last-Updated: 1994/11/14
- Originator: faqserv@bloom-picayune.MIT.EDU
- Xref: bloom-beacon.mit.edu comp.lang.perl:27006 comp.answers:8613 news.answers:30227
-
- Archive-name: perl-faq/part4
- Version: $Id: part4,v 2.3 1994/11/07 18:06:47 spp Exp spp $
- Posting-Frequency: bi-weekly
-
- This posting contains answers to the following questions about General
- Programming, Regular Expressions (Regexp) and Input/Output:
-
-
- 4.1) What are all these $@%*<> signs and how do I know when to use them?
-
- Those are type specifiers:
- $ for scalar values
- @ for indexed arrays
- % for hashed arrays (associative arrays)
- * for all types of that symbol name. These are sometimes used like
- pointers
- <> are used for inputting a record from a filehandle.
-
- See the question on arrays of arrays for more about Perl pointers.
-
- While there are a few places where you don't actually need these type
- specifiers, except for files, you should always use them. Note that
- <FILE> is NOT the type specifier for files; it's the equivalent of awk's
- getline function, that is, it reads a line from the handle FILE. When
- doing open, close, and other operations besides the getline function on
- files, do NOT use the brackets.
-
- Beware of saying:
- $foo = BAR;
- Which wil be interpreted as
- $foo = 'BAR';
- and not as
- $foo = <BAR>;
- If you always quote your strings, you'll avoid this trap.
-
- Normally, files are manipulated something like this (with appropriate
- error checking added if it were production code):
-
- open (FILE, ">/tmp/foo.$$");
- print FILE "string\n";
- close FILE;
-
- If instead of a filehandle, you use a normal scalar variable with file
- manipulation functions, this is considered an indirect reference to a
- filehandle. For example,
-
- $foo = "TEST01";
- open($foo, "file");
-
- After the open, these two while loops are equivalent:
-
- while (<$foo>) {}
- while (<TEST01>) {}
-
- as are these two statements:
-
- close $foo;
- close TEST01;
-
- but NOT to this:
-
- while (<$TEST01>) {} # error
- ^
- ^ note spurious dollar sign
-
- This is another common novice mistake; often it's assumed that
-
- open($foo, "output.$$");
-
- will fill in the value of $foo, which was previously undefined. This
- just isn't so -- you must set $foo to be the name of a filehandle
- before you attempt to open it.
-
-
- 4.2) How come Perl operators have different precedence than C operators?
-
- Actually, they don't; all C operators have the same precedence in Perl as
- they do in C. The problem is with a class of functions called list
- operators, e.g. print, chdir, exec, system, and so on. These are somewhat
- bizarre in that they have different precedence depending on whether you
- look on the left or right of them. Basically, they gobble up all things
- on their right. For example,
-
- unlink $foo, "bar", @names, "others";
-
- will unlink all those file names. A common mistake is to write:
-
- unlink "a_file" || die "snafu";
-
- The problem is that this gets interpreted as
-
- unlink("a_file" || die "snafu");
-
- To avoid this problem, you can always make them look like function calls
- or use an extra level of parentheses:
-
- unlink("a_file") || die "snafu";
- (unlink "a_file") || die "snafu";
-
- In perl5, there are low precedence "and", "or", and "not" operators,
- which bind les tightly than comma. This alllows you to write:
-
- unlink $foo, "bar", @names, "others" or die "snafu";
-
- Sometimes you actually do care about the return value:
-
- unless ($io_ok = print("some", "list")) { }
-
- Yes, print() returns I/O success. That means
-
- $io_ok = print(2+4) * 5;
-
- returns 5 times whether printing (2+4) succeeded, and
- print(2+4) * 5;
- returns the same 5*io_success value and tosses it.
-
- See the perlop(1) man page's section on Precedence for more gory details,
- and be sure to use the -w flag to catch things like this.
-
-
- 4.3) What's the difference between dynamic and static (lexical) scoping?
- What are my() and local()?
-
- [NOTE: This question refers to perl5 only. There is no my() in perl4]
- Scoping refers to visibility of variables. A dynamic variable is
- created via local() and is just a local value for a global variable,
- whereas a lexical variable created via my() is more what you're
- expecting from a C auto. (See also "What's the difference between
- deep and shallow binding.") In general, we suggest you use lexical
- variables wherever possible, as they're faster to access and easier to
- understand. The "use strict vars" pragma will enforce that all
- variables are either lexical, or full classified by package name. We
- strongly suggest that you develop your code with "use strict;" and the
- -w flag. (When using formats, however, you will still have to use
- dynamic variables.) Here's an example of the difference:
-
- $scount = 1; $lcount = 2;
- sub foo {
- my($i,$j) = @_;
- my $scount = 10;
- local $lcount = 20;
- &bar();
- }
- sub bar {
- print "scount is $scount\en";
- print "lcount is $lcount\en";
- }
-
- This prints:
-
- scount is 1
- lcount is 20
-
- Notice that the variables declared with my() are visible only within
- the scope of the block which names them. They are not visible outside
- of this block, not even in routines or blocks that it calls. local()
- variables, on the other hand, are visible to routines that are called
- from the block where they are declared. Neither is visible after the
- end (the final closing curly brace) of the block at all.
-
- Oh, lexical variables are only available in perl5. Have we
- mentioned yet that you might consider upgrading? :-)
-
-
- 4.4) What's the difference between deep and shallow binding?
-
- This only matters when you're making subroutines yourself, at least
- so far. This will give you shallow binding:
-
- {
- my $x = time;
- $coderef = sub { $x };
- }
-
- When you call &$coderef(), it will get whatever dynamic $x happens
- to be around when invoked. However, you can get the other behaviour
- this way:
-
- {
- my $x = time;
- $coderef = eval "sub { \$x }";
- }
-
- Now you'll access the lexical variable $x which is set to the
- time the subroutine was created. Note that the difference in these
- two behaviours can be considered a bug, not a feature, so you should
- in particular not rely upon shallow binding, as it will likely go
- away in the future. See perlref(1).
-
-
- 4.5) How can I manipulate fixed-record-length files?
-
- The most efficient way is using pack and unpack. This is faster than
- using substr. Here is a sample chunk of code to break up and put back
- together again some fixed-format input lines, in this case, from ps.
-
- # sample input line:
- # 15158 p5 T 0:00 perl /mnt/tchrist/scripts/now-what
- $ps_t = 'A6 A4 A7 A5 A*';
- open(PS, "ps|");
- $_ = <PS>; print;
- while (<PS>) {
- ($pid, $tt, $stat, $time, $command) = unpack($ps_t, $_);
- for $var ('pid', 'tt', 'stat', 'time', 'command' ) {
- print "$var: <", eval "\$$var", ">\n";
- }
- print 'line=', pack($ps_t, $pid, $tt, $stat, $time, $command), "\n";
- }
-
-
- 4.6) How can I make a file handle local to a subroutine?
-
- You must use the type-globbing *VAR notation. Here is some code to
- cat an include file, calling itself recursively on nested local
- include files (i.e. those with #include "file", not #include <file>):
-
- sub cat_include {
- local($name) = @_;
- local(*FILE);
- local($_);
-
- warn "<INCLUDING $name>\n";
- if (!open (FILE, $name)) {
- warn "can't open $name: $!\n";
- return;
- }
- while (<FILE>) {
- if (/^#\s*include "([^"]*)"/) {
- &cat_include($1);
- } else {
- print;
- }
- }
- close FILE;
- }
-
-
- 4.7) How can I call alarm() or usleep() from Perl?
-
- If you want finer granularity than 1 second (as usleep() provides) and
- have itimers and syscall() on your system, you can use the following.
- You could also use select().
-
- It takes a floating-point number representing how long to delay until
- you get the SIGALRM, and returns a floating- point number representing
- how much time was left in the old timer, if any. Note that the C
- function uses integers, but this one doesn't mind fractional numbers.
-
- # alarm; send me a SIGALRM in this many seconds (fractions ok)
- # tom christiansen <tchrist@convex.com>
- sub alarm {
- require 'syscall.ph';
- require 'sys/time.ph';
-
- local($ticks) = @_;
- local($in_timer,$out_timer);
- local($isecs, $iusecs, $secs, $usecs);
-
- local($itimer_t) = 'L4'; # should be &itimer'typedef()
-
- $secs = int($ticks);
- $usecs = ($ticks - $secs) * 1e6;
-
- $out_timer = pack($itimer_t,0,0,0,0);
- $in_timer = pack($itimer_t,0,0,$secs,$usecs);
-
- syscall(&SYS_setitimer, &ITIMER_REAL, $in_timer, $out_timer)
- && die "alarm: setitimer syscall failed: $!";
-
- ($isecs, $iusecs, $secs, $usecs) = unpack($itimer_t,$out_timer);
- return $secs + ($usecs/1e6);
- }
-
-
- 4.8) How can I do an atexit() or setjmp()/longjmp() in Perl? (Exception handling)
-
- Perl's exception-handling mechanism is its eval operator. You
- can use eval as setjmp and die as longjmp. Here's an example
- of Larry's for timed-out input, which in C is often implemented
- using setjmp and longjmp:
-
- $SIG{ALRM} = TIMEOUT;
- sub TIMEOUT { die "restart input\n" }
-
- do { eval { &realcode } } while $@ =~ /^restart input/;
-
- sub realcode {
- alarm 15;
- $ans = <STDIN>;
- alarm 0;
- }
-
- Here's an example of Tom's for doing atexit() handling:
-
- sub atexit { push(@_exit_subs, @_) }
-
- sub _cleanup { unlink $tmp }
-
- &atexit('_cleanup');
-
- eval <<'End_Of_Eval'; $here = __LINE__;
- # as much code here as you want
- End_Of_Eval
-
- $oops = $@; # save error message
-
- # now call his stuff
- for (@_exit_subs) { &$_() }
-
- $oops && ($oops =~ s/\(eval\) line (\d+)/$0 .
- " line " . ($1+$here)/e, die $oops);
-
- You can register your own routines via the &atexit function now. You
- might also want to use the &realcode method of Larry's rather than
- embedding all your code in the here-is document. Make sure to leave
- via die rather than exit, or write your own &exit routine and call
- that instead. In general, it's better for nested routines to exit
- via die rather than exit for just this reason.
-
- In Perl5, it is easy to set this up because of the automatic processing
- of per-package END functions.
-
- Eval is also quite useful for testing for system dependent features,
- like symlinks, or using a user-input regexp that might otherwise
- blowup on you.
-
-
- 4.9) How do I catch signals in perl?
-
- Perl allows you to trap signals using the %SIG associative array.
- Using the signals you want to trap as the key, you can assign a
- subroutine to that signal. The %SIG array will only contain those
- values which the programmer defines. Therefore, you do not have to
- assign all signals. For example, to exit cleanly from a ^C:
-
- $SIG{'INT'} = 'CLEANUP';
- sub CLEANUP {
- print "\n\nCaught Interrupt (^C), Aborting\n";
- exit(1);
- }
-
- There are two special "routines" for signals called DEFAULT and IGNORE.
- DEFAULT erases the current assignment, restoring the default value of
- the signal. IGNORE causes the signal to be ignored. In general, you
- don't need to remember these as you can emulate their functionality
- with standard programming features. DEFAULT can be emulated by
- deleting the signal from the array and IGNORE can be emulated by any
- undeclared subroutine.
-
- 4.10) Why doesn't Perl interpret my octal data octally?
-
- Perl only understands octal and hex numbers as such when they occur
- as literals in your program. If they are read in from somewhere and
- assigned, then no automatic conversion takes place. You must
- explicitly use oct() or hex() if you want this kind of thing to happen.
- Actually, oct() knows to interpret both hex and octal numbers, while
- hex only converts hexadecimal ones. For example:
-
- {
- print "What mode would you like? ";
- $mode = <STDIN>;
- $mode = oct($mode);
- unless ($mode) {
- print "You can't really want mode 0!\n";
- redo;
- }
- chmod $mode, $file;
- }
-
- Without the octal conversion, a requested mode of 755 would turn
- into 01363, yielding bizarre file permissions of --wxrw--wt.
-
- If you want something that handles decimal, octal and hex input,
- you could follow the suggestion in the man page and use:
-
- $val = oct($val) if $val =~ /^0/;
-
-
- 4.11) How can I compare two date strings?
-
- If the dates are in an easily parsed, predetermined format, then you
- can break them up into their component parts and call &timelocal from
- the distributed perl library. If the date strings are in arbitrary
- formats, however, it's probably easier to use the getdate program from
- the Cnews distribution, since it accepts a wide variety of dates. Note
- that in either case the return values you will really be comparing will
- be the total time in seconds as returned by time().
-
- Here's a getdate function for perl that's not very efficient; you can
- do better than this by sending it many dates at once or modifying
- getdate to behave better on a pipe. Beware the hardcoded pathname.
-
- sub getdate {
- local($_) = shift;
-
- s/-(\d{4})$/+$1/ || s/\+(\d{4})$/-$1/;
- # getdate has broken timezone sign reversal!
-
- $_ = `/usr/local/lib/news/newsbin/getdate '$_'`;
- chop;
- $_;
- }
-
- Richard Ohnemus <Rick_Ohnemus@Sterling.COM> actually has a getdate.y for
- use with the Perl yacc. You can get this from ftp.sterling.com
- [192.124.9.1] in /local/perl-byacc1.8.1.tar.Z, or send the author mail
- for details.
-
- You might also consider using these:
-
- date.pl - print dates how you want with the sysv +FORMAT method
- date.shar - routines to manipulate and calculate dates
- ftp-chat2.shar - updated version of ftpget. includes library and demo
- programs
- getdate.shar - returns number of seconds since epoch for any given
- date
- ptime.shar - print dates how you want with the sysv +FORMAT method
-
- You probably want 'getdate.shar'... these and other files can be ftp'd
- from the /pub/perl/scripts directory on ftp.cis.ufl.edu. See the README
- file in the /pub/perl directory for time and the European mirror site
- details.
-
-
- 4.12) How can I find the Julian Day?
-
- Here's an example of a Julian Date function provided by Thomas R.
- Kimpton*.
-
- #!/usr/local/bin/perl
-
- @theJulianDate = ( 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 );
-
- #************************************************************************
- #**** Return 1 if we are after the leap day in a leap year. *****
- #************************************************************************
-
- sub leapDay
- {
- my($year,$month,$day) = @_;
-
- if (year % 4) {
- return(0);
- }
-
- if (!(year % 100)) { # years that are multiples of 100
- # are not leap years
- if (year % 400) { # unless they are multiples of 400
- return(0);
- }
- }
- if (month < 2) {
- return(0);
- } elsif ((month == 2) && (day < 29)) {
- return(0);
- } else {
- return(1);
- }
- }
-
- #************************************************************************
- #**** Pass in the date, in seconds, of the day you want the *****
- #**** julian date for. If your localtime() returns the year day *****
- #**** return that, otherwise figure out the julian date. *****
- #************************************************************************
-
- sub julianDate
- {
- my($dateInSeconds) = @_;
- my($sec, $min, $hour, $mday, $mon, $year, $wday, $yday);
-
- ($sec, $min, $hour, $mday, $mon, $year, $wday, $yday) =
- localtime($dateInSeconds);
- if (defined($yday)) {
- return($yday+1);
- } else {
- return($theJulianDate[$mon] + $mday + &leapDay($year,$mon,$mday));
- }
-
- }
-
- print "Today's julian date is: ",&julianDate(time),"\n";
-
-
- 4.13) What's the fastest way to code up a given task in perl?
-
- Post it to comp.lang.perl and ask Tom or Randal a question about it.
- ;)
-
- Because Perl so lends itself to a variety of different approaches for
- any given task, a common question is which is the fastest way to code a
- given task. Since some approaches can be dramatically more efficient
- that others, it's sometimes worth knowing which is best.
- Unfortunately, the implementation that first comes to mind, perhaps as
- a direct translation from C or the shell, often yields suboptimal
- performance. Not all approaches have the same results across different
- hardware and software platforms. Furthermore, legibility must
- sometimes be sacrificed for speed.
-
- While an experienced perl programmer can sometimes eye-ball the code
- and make an educated guess regarding which way would be fastest,
- surprises can still occur. So, in the spirit of perl programming
- being an empirical science, the best way to find out which of several
- different methods runs the fastest is simply to code them all up and
- time them. For example:
-
- $COUNT = 10_000; $| = 1;
-
- print "method 1: ";
-
- ($u, $s) = times;
- for ($i = 0; $i < $COUNT; $i++) {
- # code for method 1
- }
- ($nu, $ns) = times;
- printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);
-
- print "method 2: ";
-
- ($u, $s) = times;
- for ($i = 0; $i < $COUNT; $i++) {
- # code for method 2
- }
- ($nu, $ns) = times;
- printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);
-
- Perl5 includes a new module called Benchmark.pm. You can now simplify
- the code to use the Benchmarking, like so:
-
- use Benchmark;
-
- timethese($count, {
- Name1 => '...code for method 1...',
- Name2 => '...code for method 2...',
- ... });
-
- It will output something that looks similar to this:
-
- Benchmark: timing 100 iterations of Name1, Name2...
- Name1: 2 secs (0.50 usr 0.00 sys = 0.50 cpu)
- Name2: 1 secs (0.48 usr 0.00 sys = 0.48 cpu)
-
-
- For example, the following code will show the time difference between
- three different ways of assigning the first character of a string to
- a variable:
-
- use Benchmark;
- timethese(100000, {
- 'regex1' => '$str="ABCD"; $str =~ s/^(.)//; $ch = $1',
- 'regex2' => '$str="ABCD"; $str =~ s/^.//; $ch = $&',
- 'substr' => '$str="ABCD"; $ch=substr($str,0,1); substr($str,0,1)="",
- });
-
- The results will be returned like this:
-
- Benchmark: timing 100000 iterations of regex1, regex2, substr...
- regex1: 11 secs (10.80 usr 0.00 sys = 10.80 cpu)
- regex2: 10 secs (10.23 usr 0.00 sys = 10.23 cpu)
- substr: 7 secs ( 5.62 usr 0.00 sys = 5.62 cpu)
-
- For more specific tips, see the section on Efficiency in the
- ``Other Oddments'' chapter at the end of the Camel Book.
-
-
- 4.14) Do I always/never have to quote my strings or use semicolons?
-
- You don't have to quote strings that can't mean anything else in the
- language, like identifiers with any upper-case letters in them.
- Therefore, it's fine to do this:
-
- $SIG{INT} = Timeout_Routine;
- or
-
- @Days = (Sun, Mon, Tue, Wed, Thu, Fri, Sat, Sun);
-
- but you can't get away with this:
-
- $foo{while} = until;
-
- in place of
-
- $foo{'while'} = 'until';
-
- The requirements on semicolons have been increasingly relaxed. You no
- longer need one at the end of a block, but stylistically, you're better
- to use them if you don't put the curly brace on the same line:
-
- for (1..10) { print }
-
- is ok, as is
-
- @nlist = sort { $a <=> $b } @olist;
-
- but you probably shouldn't do this:
-
- for ($i = 0; $i < @a; $i++) {
- print "i is $i\n" # <-- oops!
- }
-
- because you might want to add lines later, and anyway, it looks
- funny. :-)
-
-
- 4.15) What is variable suicide and how can I prevent it?
-
- Variable suicide is a nasty side effect of dynamic scoping and the way
- variables are passed by reference. If you say
-
- $x = 17;
- &munge($x);
- sub munge {
- local($x);
- local($myvar) = $_[0];
- ...
- }
-
- Then you have just clobbered $_[0]! Why this is occurring is pretty
- heavy wizardry: the reference to $x stored in $_[0] was temporarily
- occluded by the previous local($x) statement (which, you're recall,
- occurs at run-time, not compile-time). The work around is simple,
- however: declare your formal parameters first:
-
- sub munge {
- local($myvar) = $_[0];
- local($x);
- ...
- }
-
- That doesn't help you if you're going to be trying to access @_
- directly after the local()s. In this case, careful use of the package
- facility is your only recourse.
-
- Another manifestation of this problem occurs due to the magical nature
- of the index variable in a foreach() loop.
-
- @num = 0 .. 4;
- print "num begin @num\n";
- foreach $m (@num) { &ug }
- print "num finish @num\n";
- sub ug {
- local($m) = 42;
- print "m=$m $num[0],$num[1],$num[2],$num[3]\n";
- }
-
- Which prints out the mysterious:
-
- num begin 0 1 2 3 4
- m=42 42,1,2,3
- m=42 0,42,2,3
- m=42 0,1,42,3
- m=42 0,1,2,42
- m=42 0,1,2,3
- num finish 0 1 2 3 4
-
- What's happening here is that $m is an alias for each element of @num.
- Inside &ug, you temporarily change $m. Well, that means that you've
- also temporarily changed whatever $m is an alias to!! The only
- workaround is to be careful with global variables, using packages,
- and/or just be aware of this potential in foreach() loops.
-
- The perl5 static autos via "my" will not have this problem.
-
-
- 4.16) What does "Malformed command links" mean?
-
- This is a bug in 4.035. While in general it's merely a cosmetic
- problem, it often comanifests with a highly undesirable coredumping
- problem. Programs known to be affected by the fatal coredump include
- plum and pcops. This bug has been fixed since 4.036. It did not
- resurface in 5.000.
-
-
- 4.17) How can I set up a footer format to be used with write()?
-
- While the $^ variable contains the name of the current header format,
- there is no corresponding mechanism to automatically do the same thing
- for a footer. Not knowing how big a format is going to be until you
- evaluate it is one of the major problems.
-
- If you have a fixed-size footer, you can get footers by checking for
- line left on page ($-) before each write, and printing the footer
- yourself if necessary.
-
- Another strategy is to open a pipe to yourself, using open(KID, "|-")
- and always write()ing to the KID, who then postprocesses its STDIN to
- rearrange headers and footers however you like. Not very convenient,
- but doable.
-
-
- 4.18) Why does my Perl program keep growing in size?
-
- This is caused by a strange occurance that Larry has dubbed "feeping
- creaturism". Larry is always adding one more feature, always getting
- Perl to handle one more problem. Hence, it keeps growing. Once you've
- worked with perl long enough, you will probably start to do the same
- thing. You will then notice this problem as you see your scripts
- becoming larger and larger.
-
- Oh, wait... you meant a currently running program and it's stack size.
- Mea culpa, I misunderstood you. ;) While there may be a real memory
- leak in the Perl source code or even whichever malloc() you're using,
- common causes are incomplete eval()s or local()s in loops.
-
- An eval() which terminates in error due to a failed parsing will leave
- a bit of memory unusable.
-
- A local() inside a loop:
-
- for (1..100) {
- local(@array);
- }
-
- will build up 100 versions of @array before the loop is done. The
- work-around is:
-
- local(@array);
- for (1..100) {
- undef @array;
- }
-
- Larry reports that this behavior is fixed for perl5.
-
-
- 4.19) Can I do RPC in Perl?
-
- Yes, you can, since Perl has access to sockets. An example of the rup
- program written in Perl can be found in the script ruptime.pl at the
- scripts archive on ftp.cis.ufl.edu. I warn you, however, that it's not
- a pretty sight, as it's used nothing from h2ph or c2ph, so everything is
- utterly hard-wired.
-
-
- 4.20) How can I quote a variable to use in a regexp?
-
- From the manual:
-
- $pattern =~ s/(\W)/\\$1/g;
-
- Now you can freely use /$pattern/ without fear of any unexpected meta-
- characters in it throwing off the search. If you don't know whether a
- pattern is valid or not, enclose it in an eval to avoid a fatal run-
- time error.
-
- Perl5 provides a vastly improved way of doing this. Simply use the
- new quotemeta character (\Q) within your variable.
-
- 4.21) How can I change the first N letters of a string?
-
- Remember that the substr() function produces an lvalue, that is, it may
- be assigned to. Therefore, to change the first character to an S, you
- could do this:
-
- substr($var,0,1) = 'S';
-
- This assumes that $[ is 0; for a library routine where you can't know
- $[, you should use this instead:
-
- substr($var,$[,1) = 'S';
-
- While it would be slower, you could in this case use a substitute:
-
- $var =~ s/^./S/;
-
- But this won't work if the string is empty or its first character is a
- newline, which "." will never match. So you could use this instead:
-
- $var =~ s/^[^\0]?/S/;
-
- To do things like translation of the first part of a string, use
- substr, as in:
-
- substr($var, $[, 10) =~ tr/a-z/A-Z/;
-
- If you don't know the length of what to translate, something like this
- works:
-
- /^(\S+)/ && substr($_,$[,length($1)) =~ tr/a-z/A-Z/;
-
- For some things it's convenient to use the /e switch of the substitute
- operator:
-
- s/^(\S+)/($tmp = $1) =~ tr#a-z#A-Z#, $tmp/e
-
- although in this case, it runs more slowly than does the previous
- example.
-
-
- 4.22) Can I use Perl regular expressions to match balanced text?
-
- No, or at least, not by the themselves.
-
- Regexps just aren't powerful enough. Although Perl's patterns aren't
- strictly regular because they do backreferencing (the \1 notation), you
- still can't do it. You need to employ auxiliary logic. A simple
- approach would involve keeping a bit of state around, something
- vaguely like this (although we don't handle patterns on the same line):
-
- while(<>) {
- if (/pat1/) {
- if ($inpat++ > 0) { warn "already saw pat1" }
- redo;
- }
- if (/pat2/) {
- if (--$inpat < 0) { warn "never saw pat1" }
- redo;
- }
- }
-
- A rather more elaborate subroutine to pull out balanced and possibly
- nested single chars, like ` and ', { and }, or ( and ) can be found
- on convex.com in /pub/perl/scripts/pull_quotes.
-
-
- 4.23) What does it mean that regexps are greedy? How can I get around it?
-
- The basic idea behind regexps being greedy is that they will match the
- maximum amount of data that they can, sometimes resulting in incorrect
- or strange answers.
-
- For example, I recently came across something like this:
-
- $_="this (is) an (example) of multiple parens";
- while ( m#\((.*)\)#g ) {
- print "$1\n";
- }
-
- This code was supposed to match everything between a set of
- parentheses. The expected output was:
-
- is
- example
-
- However, the backreference ($1) ended up containing "is) an (example",
- clearly not what was intended.
-
- In perl4, the way to stop this from happening is to use a negated
- group. If the above example is rewritten as follows, the results are
- correct:
-
- while ( m#\(([^)]*)\)#g ) {
-
- In perl5 there is a new minimal matching metacharacter, '?'. This
- character is added to the normal metacharacters to modify their
- behaviour, such as "*?", "+?", or even "??". The example would now be
- written in the following style:
-
- while (m#\((.*?)\)#g )
-
- Hint: This new operator leads to a very elegant method of stripping
- comments from C code:
-
- s:/\*.*?\*/::gs
-
-
- 4.24) How do I use a regular expression to strip C style comments from a
- file?
-
- Since we're talking about how to strip comments under perl5, now is a
- good time to talk about doing it in perl4. The easiest way to strip
- comments in perl4 is to transform the comment close (*/) into something
- that can't be in the string, or is at least extremely unlikely to be in
- the string. I find \256 (the registered or reserved sign, an R inside
- a circle) is fairly unlikely to be used and is easy to remember. So,
- our code looks something like this:
-
- s:\*/:\256:g; # Change all */ to circled R
- s:/\*[^\256]*\256::g; # Remove everything from \* to circled R
- print;
-
- To ensure that you correctly handle multi-line comments, don't forget
- to set $* to 1, informing perl that it should do multi-line pattern
- matching.
-
- [Untested changes. If it's wrong or you don't understand it, check
- with Jeff. If it's wrong, let me know so I can change it. ]
-
- Jeff Friedl* suggests that the above solution is incorrect. He says it
- will fail on imbedded comments and function proto-typing as well as on
- comments that are part of strings. The following regexp should handle
- everything:
-
- $/ = undef;
- $_ = <>;
-
- s#/\*[^*]*\*+([^/*][^*]*\*+)*/|([^/"']*("[^"\\]*(\\[\d\D][^"\\]*)*"[^/"']*|'[^'\\]*(\\[\d\D][^'\\]*)*'[^/"']*|/+[^*/][^/"']*)*)#$2#g;
- print;
-
-
- 4.25) Why doesn't "local($foo) = <FILE>;" work right?
-
- Well, it does. The thing to remember is that local() provides an array
- context, and that the <FILE> syntax in an array context will read all the
- lines in a file. To work around this, use:
-
- local($foo);
- $foo = <FILE>;
-
- You can use the scalar() operator to cast the expression into a scalar
- context:
-
- local($foo) = scalar(<FILE>);
-
-
- 4.26) How can I detect keyboard input without reading it?
-
- You should check out the Frequently Asked Questions list in
- comp.unix.* for things like this: the answer is essentially the same.
- It's very system dependent. Here's one solution that works on BSD
- systems:
-
- sub key_ready {
- local($rin, $nfd);
- vec($rin, fileno(STDIN), 1) = 1;
- return $nfd = select($rin,undef,undef,0);
- }
-
-
- 4.27) How can I read a single character from the keyboard under UNIX and DOS?
-
- A closely related question to the no-echo question below is how to
- input a single character from the keyboard. Again, this is a system
- dependent operation. The following code may or may not help you. It
- should work on both SysV and BSD flavors of UNIX:
-
- $BSD = -f '/vmunix';
- if ($BSD) {
- system "stty cbreak </dev/tty >/dev/tty 2>&1";
- }
- else {
- system "stty", '-icanon',
- system "stty", 'eol', "\001";
- }
-
- $key = getc(STDIN);
-
- if ($BSD) {
- system "stty -cbreak </dev/tty >/dev/tty 2>&1";
- }
- else {
- system "stty", 'icanon';
- system "stty", 'eol', '^@'; # ascii null
- }
- print "\n";
-
- You could also handle the stty operations yourself for speed if you're
- going to be doing a lot of them. This code works to toggle cbreak
- and echo modes on a BSD system:
-
- sub set_cbreak { # &set_cbreak(1) or &set_cbreak(0)
- local($on) = $_[0];
- local($sgttyb,@ary);
- require 'sys/ioctl.ph';
- $sgttyb_t = 'C4 S' unless $sgttyb_t; # c2ph: &sgttyb'typedef()
-
- ioctl(STDIN,&TIOCGETP,$sgttyb) || die "Can't ioctl TIOCGETP: $!";
-
- @ary = unpack($sgttyb_t,$sgttyb);
- if ($on) {
- $ary[4] |= &CBREAK;
- $ary[4] &= ~&ECHO;
- } else {
- $ary[4] &= ~&CBREAK;
- $ary[4] |= &ECHO;
- }
- $sgttyb = pack($sgttyb_t,@ary);
-
- ioctl(STDIN,&TIOCSETP,$sgttyb) || die "Can't ioctl TIOCSETP: $!";
- }
-
- Note that this is one of the few times you actually want to use the
- getc() function; it's in general way too expensive to call for normal
- I/O. Normally, you just use the <FILE> syntax, or perhaps the read()
- or sysread() functions.
-
- For perspectives on more portable solutions, use anon ftp to retrieve
- the file /pub/perl/info/keypress from convex.com.
-
- For DOS systems, Dan Carson <dbc@tc.fluke.COM> reports:
-
- To put the PC in "raw" mode, use ioctl with some magic numbers gleaned
- from msdos.c (Perl source file) and Ralf Brown's interrupt list (comes
- across the net every so often):
-
- $old_ioctl = ioctl(STDIN,0,0); # Gets device info
- $old_ioctl &= 0xff;
- ioctl(STDIN,1,$old_ioctl | 32); # Writes it back, setting bit 5
-
- Then to read a single character:
-
- sysread(STDIN,$c,1); # Read a single character
-
- And to put the PC back to "cooked" mode:
-
- ioctl(STDIN,1,$old_ioctl); # Sets it back to cooked mode.
-
-
- So now you have $c. If ord($c) == 0, you have a two byte code, which
- means you hit a special key. Read another byte (sysread(STDIN,$c,1)),
- and that value tells you what combination it was according to this
- table:
-
- # PC 2-byte keycodes = ^@ + the following:
-
- # HEX KEYS
- # --- ----
- # 0F SHF TAB
- # 10-19 ALT QWERTYUIOP
- # 1E-26 ALT ASDFGHJKL
- # 2C-32 ALT ZXCVBNM
- # 3B-44 F1-F10
- # 47-49 HOME,UP,PgUp
- # 4B LEFT
- # 4D RIGHT
- # 4F-53 END,DOWN,PgDn,Ins,Del
- # 54-5D SHF F1-F10
- # 5E-67 CTR F1-F10
- # 68-71 ALT F1-F10
- # 73-77 CTR LEFT,RIGHT,END,PgDn,HOME
- # 78-83 ALT 1234567890-=
- # 84 CTR PgUp
-
- This is all trial and error I did a long time ago, I hope I'm reading the
- file that worked.
-
-
- 4.28) How can I get input from the keyboard without it echoing to the
- screen?
-
- Terminal echoing is generally handled directly by the shell.
- Therefore, there is no direct way in perl to turn echoing on and off.
- However, you can call the command "stty [-]echo". The following will
- allow you to accept input without it being echoed to the screen, for
- example as a way to accept passwords (error checking deleted for
- brevity):
-
- print "Please enter your password: ";
- system("stty -echo");
- chop($password=<STDIN>);
- print "\n";
- system("stty echo");
-
- 4.29) Is there any easy way to strip blank space from the beginning/end of
- a string?
-
- Yes, there is. Using the substitution command, you can match the
- blanks and replace it with nothing. For example, if you have the
- string " String " you can use this:
-
- $_ = " String ";
- print ":$_:\n"; # OUTPUT: ": String :"
- s/^\s*//;
- print ":$_:\n"; # OUTPUT: ":String :"
- s/\s*$//;
- print ":$_:\n"; # OUTPUT: ":String:"
-
- Unfortunately, there is no simple single statement that will strip
- whitespace from both the front and the back in perl4. However, in
- perl5 you should be able to say:
-
- s/\s*(.*?)\s*$/$1/;
- Stephen P Potter spp@vx.com Varimetrix Corporation
- 2350 Commerce Park Drive, Suite 4 Palm Bay, FL 32905
- (407) 676-3222 CAD/CAM/CAE/Software
-
-