home *** CD-ROM | disk | FTP | other *** search
- =head1 NAME
-
- perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)
-
- =head1 DESCRIPTION
-
- This section deals with questions related to networking, the internet,
- and a few on the web.
-
- =head2 My CGI script runs from the command line but not the browser. (500 Server Error)
-
- If you can demonstrate that you've read the following FAQs and that
- your problem isn't something simple that can be easily answered, you'll
- probably receive a courteous and useful reply to your question if you
- post it on comp.infosystems.www.authoring.cgi (if it's something to do
- with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl
- questions but are really CGI ones that are posted to comp.lang.perl.misc
- may not be so well received.
-
- The useful FAQs and related documents are:
-
- CGI FAQ
- http://www.webthing.com/tutorials/cgifaq.html
-
- Web FAQ
- http://www.boutell.com/faq/
-
- WWW Security FAQ
- http://www.w3.org/Security/Faq/
-
- HTTP Spec
- http://www.w3.org/pub/WWW/Protocols/HTTP/
-
- HTML Spec
- http://www.w3.org/TR/REC-html40/
- http://www.w3.org/pub/WWW/MarkUp/
-
- CGI Spec
- http://www.w3.org/CGI/
-
- CGI Security FAQ
- http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
-
- =head2 How can I get better error messages from a CGI program?
-
- Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
- normal Carp modules C<carp>, C<croak>, and C<confess> functions with
- more verbose and safer versions. It still sends them to the normal
- server error log.
-
- use CGI::Carp;
- warn "This is a complaint";
- die "But this one is serious";
-
- The following use of CGI::Carp also redirects errors to a file of your choice,
- placed in a BEGIN block to catch compile-time warnings as well:
-
- BEGIN {
- use CGI::Carp qw(carpout);
- open(LOG, ">>/var/local/cgi-logs/mycgi-log")
- or die "Unable to append to mycgi-log: $!\n";
- carpout(*LOG);
- }
-
- You can even arrange for fatal errors to go back to the client browser,
- which is nice for your own debugging, but might confuse the end user.
-
- use CGI::Carp qw(fatalsToBrowser);
- die "Bad error here";
-
- Even if the error happens before you get the HTTP header out, the module
- will try to take care of this to avoid the dreaded server 500 errors.
- Normal warnings still go out to the server error log (or wherever
- you've sent them with C<carpout>) with the application name and date
- stamp prepended.
-
- =head2 How do I remove HTML from a string?
-
- The most correct way (albeit not the fastest) is to use HTML::Parser
- from CPAN. Another mostly correct
- way is to use HTML::FormatText which not only removes HTML but also
- attempts to do a little simple formatting of the resulting plain text.
-
- Many folks attempt a simple-minded regular expression approach, like
- C<< s/<.*?>//g >>, but that fails in many cases because the tags
- may continue over line breaks, they may contain quoted angle-brackets,
- or HTML comment may be present. Plus folks forget to convert
- entities, like C<<> for example.
-
- Here's one "simple-minded" approach, that works for most files:
-
- #!/usr/bin/perl -p0777
- s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
-
- If you want a more complete solution, see the 3-stage striphtml
- program in
- http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
- .
-
- Here are some tricky cases that you should think about when picking
- a solution:
-
- <IMG SRC = "foo.gif" ALT = "A > B">
-
- <IMG SRC = "foo.gif"
- ALT = "A > B">
-
- <!-- <A comment> -->
-
- <script>if (a<b && a>c)</script>
-
- <# Just data #>
-
- <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
-
- If HTML comments include other tags, those solutions would also break
- on text like this:
-
- <!-- This section commented out.
- <B>You can't see me!</B>
- -->
-
- =head2 How do I extract URLs?
-
- A quick but imperfect approach is
-
- #!/usr/bin/perl -n00
- # qxurl - tchrist@perl.com
- print "$2\n" while m{
- < \s*
- A \s+ HREF \s* = \s* (["']) (.*?) \1
- \s* >
- }gsix;
-
- This version does not adjust relative URLs, understand alternate
- bases, deal with HTML comments, deal with HREF and NAME attributes
- in the same tag, understand extra qualifiers like TARGET, or accept
- URLs themselves as arguments. It also runs about 100x faster than a
- more "complete" solution using the LWP suite of modules, such as the
- http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
-
- =head2 How do I download a file from the user's machine? How do I open a file on another machine?
-
- In the context of an HTML form, you can use what's known as
- B<multipart/form-data> encoding. The CGI.pm module (available from
- CPAN) supports this in the start_multipart_form() method, which isn't
- the same as the startform() method.
-
- =head2 How do I make a pop-up menu in HTML?
-
- Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm
- module (available from CPAN) supports this widget, as well as many
- others, including some that it cleverly synthesizes on its own.
-
- =head2 How do I fetch an HTML file?
-
- One approach, if you have the lynx text-based HTML browser installed
- on your system, is this:
-
- $html_code = `lynx -source $url`;
- $text_data = `lynx -dump $url`;
-
- The libwww-perl (LWP) modules from CPAN provide a more powerful way
- to do this. They don't require lynx, but like lynx, can still work
- through proxies:
-
- # simplest version
- use LWP::Simple;
- $content = get($URL);
-
- # or print HTML from a URL
- use LWP::Simple;
- getprint "http://www.linpro.no/lwp/";
-
- # or print ASCII from HTML from a URL
- # also need HTML-Tree package from CPAN
- use LWP::Simple;
- use HTML::Parser;
- use HTML::FormatText;
- my ($html, $ascii);
- $html = get("http://www.perl.com/");
- defined $html
- or die "Can't fetch HTML from http://www.perl.com/";
- $ascii = HTML::FormatText->new->format(parse_html($html));
- print $ascii;
-
- =head2 How do I automate an HTML form submission?
-
- If you're submitting values using the GET method, create a URL and encode
- the form using the C<query_form> method:
-
- use LWP::Simple;
- use URI::URL;
-
- my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
- $url->query_form(module => 'DB_File', readme => 1);
- $content = get($url);
-
- If you're using the POST method, create your own user agent and encode
- the content appropriately.
-
- use HTTP::Request::Common qw(POST);
- use LWP::UserAgent;
-
- $ua = LWP::UserAgent->new();
- my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
- [ module => 'DB_File', readme => 1 ];
- $content = $ua->request($req)->as_string;
-
- =head2 How do I decode or create those %-encodings on the web?
-
- Here's an example of decoding:
-
- $string = "http://altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=.&q=%2Bcgi-bin+%2Bperl.exe";
- $string =~ s/%([a-fA-F0-9]{2})/chr(hex($1))/ge;
-
- Encoding is a bit harder, because you can't just blindly change
- all the non-alphanumunder character (C<\W>) into their hex escapes.
- It's important that characters with special meaning like C</> and C<?>
- I<not> be translated. Probably the easiest way to get this right is
- to avoid reinventing the wheel and just use the URI::Escape module,
- available from CPAN.
-
- =head2 How do I redirect to another page?
-
- Instead of sending back a C<Content-Type> as the headers of your
- reply, send back a C<Location:> header. Officially this should be a
- C<URI:> header, so the CGI.pm module (available from CPAN) sends back
- both:
-
- Location: http://www.domain.com/newpage
- URI: http://www.domain.com/newpage
-
- Note that relative URLs in these headers can cause strange effects
- because of "optimizations" that servers do.
-
- $url = "http://www.perl.com/CPAN/";
- print "Location: $url\n\n";
- exit;
-
- To target a particular frame in a frameset, include the "Window-target:"
- in the header.
-
- print <<EOF;
- Location: http://www.domain.com/newpage
- Window-target: <FrameName>
-
- EOF
-
- To be correct to the spec, each of those virtual newlines should really be
- physical C<"\015\012"> sequences by the time you hit the client browser.
- Except for NPH scripts, though, that local newline should get translated
- by your server into standard form, so you shouldn't have a problem
- here, even if you are stuck on MacOS. Everybody else probably won't
- even notice.
-
- =head2 How do I put a password on my web pages?
-
- That depends. You'll need to read the documentation for your web
- server, or perhaps check some of the other FAQs referenced above.
-
- =head2 How do I edit my .htpasswd and .htgroup files with Perl?
-
- The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a
- consistent OO interface to these files, regardless of how they're
- stored. Databases may be text, dbm, Berkley DB or any database with a
- DBI compatible driver. HTTPD::UserAdmin supports files used by the
- `Basic' and `Digest' authentication schemes. Here's an example:
-
- use HTTPD::UserAdmin ();
- HTTPD::UserAdmin
- ->new(DB => "/foo/.htpasswd")
- ->add($username => $password);
-
- =head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
-
- Read the CGI security FAQ, at
- http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html, and the
- Perl/CGI FAQ at
- http://www.perl.com/CPAN/doc/FAQs/cgi/perl-cgi-faq.html.
-
- In brief: use tainting (see L<perlsec>), which makes sure that data
- from outside your script (eg, CGI parameters) are never used in
- C<eval> or C<system> calls. In addition to tainting, never use the
- single-argument form of system() or exec(). Instead, supply the
- command and arguments as a list, which prevents shell globbing.
-
- =head2 How do I parse a mail header?
-
- For a quick-and-dirty solution, try this solution derived
- from page 222 of the 2nd edition of "Programming Perl":
-
- $/ = '';
- $header = <MSG>;
- $header =~ s/\n\s+/ /g; # merge continuation lines
- %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
-
- That solution doesn't do well if, for example, you're trying to
- maintain all the Received lines. A more complete approach is to use
- the Mail::Header module from CPAN (part of the MailTools package).
-
- =head2 How do I decode a CGI form?
-
- You use a standard module, probably CGI.pm. Under no circumstances
- should you attempt to do so by hand!
-
- You'll see a lot of CGI programs that blindly read from STDIN the number
- of bytes equal to CONTENT_LENGTH for POSTs, or grab QUERY_STRING for
- decoding GETs. These programs are very poorly written. They only work
- sometimes. They typically forget to check the return value of the read()
- system call, which is a cardinal sin. They don't handle HEAD requests.
- They don't handle multipart forms used for file uploads. They don't deal
- with GET/POST combinations where query fields are in more than one place.
- They don't deal with keywords in the query string.
-
- In short, they're bad hacks. Resist them at all costs. Please do not be
- tempted to reinvent the wheel. Instead, use the CGI.pm or CGI_Lite.pm
- (available from CPAN), or if you're trapped in the module-free land
- of perl1 .. perl4, you might look into cgi-lib.pl (available from
- http://cgi-lib.stanford.edu/cgi-lib/ ).
-
- Make sure you know whether to use a GET or a POST in your form.
- GETs should only be used for something that doesn't update the server.
- Otherwise you can get mangled databases and repeated feedback mail
- messages. The fancy word for this is ``idempotency''. This simply
- means that there should be no difference between making a GET request
- for a particular URL once or multiple times. This is because the
- HTTP protocol definition says that a GET request may be cached by the
- browser, or server, or an intervening proxy. POST requests cannot be
- cached, because each request is independent and matters. Typically,
- POST requests change or depend on state on the server (query or update
- a database, send mail, or purchase a computer).
-
- =head2 How do I check a valid mail address?
-
- You can't, at least, not in real time. Bummer, eh?
-
- Without sending mail to the address and seeing whether there's a human
- on the other hand to answer you, you cannot determine whether a mail
- address is valid. Even if you apply the mail header standard, you
- can have problems, because there are deliverable addresses that aren't
- RFC-822 (the mail header standard) compliant, and addresses that aren't
- deliverable which are compliant.
-
- Many are tempted to try to eliminate many frequently-invalid
- mail addresses with a simple regex, such as
- C</^[\w.-]+\@([\w.-]\.)+\w+$/>. It's a very bad idea. However,
- this also throws out many valid ones, and says nothing about
- potential deliverability, so is not suggested. Instead, see
- http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
- which actually checks against the full RFC spec (except for nested
- comments), looks for addresses you may not wish to accept mail to
- (say, Bill Clinton or your postmaster), and then makes sure that the
- hostname given can be looked up in the DNS MX records. It's not fast,
- but it works for what it tries to do.
-
- Our best advice for verifying a person's mail address is to have them
- enter their address twice, just as you normally do to change a password.
- This usually weeds out typos. If both versions match, send
- mail to that address with a personal message that looks somewhat like:
-
- Dear someuser@host.com,
-
- Please confirm the mail address you gave us Wed May 6 09:38:41
- MDT 1998 by replying to this message. Include the string
- "Rumpelstiltskin" in that reply, but spelled in reverse; that is,
- start with "Nik...". Once this is done, your confirmed address will
- be entered into our records.
-
- If you get the message back and they've followed your directions,
- you can be reasonably assured that it's real.
-
- A related strategy that's less open to forgery is to give them a PIN
- (personal ID number). Record the address and PIN (best that it be a
- random one) for later processing. In the mail you send, ask them to
- include the PIN in their reply. But if it bounces, or the message is
- included via a ``vacation'' script, it'll be there anyway. So it's
- best to ask them to mail back a slight alteration of the PIN, such as
- with the characters reversed, one added or subtracted to each digit, etc.
-
- =head2 How do I decode a MIME/BASE64 string?
-
- The MIME-tools package (available from CPAN) handles this and a lot
- more. Decoding BASE64 becomes as simple as:
-
- use MIME::base64;
- $decoded = decode_base64($encoded);
-
- A more direct approach is to use the unpack() function's "u"
- format after minor transliterations:
-
- tr#A-Za-z0-9+/##cd; # remove non-base64 chars
- tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
- $len = pack("c", 32 + 0.75*length); # compute length byte
- print unpack("u", $len . $_); # uudecode and print
-
- =head2 How do I return the user's mail address?
-
- On systems that support getpwuid, the $< variable and the
- Sys::Hostname module (which is part of the standard perl distribution),
- you can probably try using something like this:
-
- use Sys::Hostname;
- $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
-
- Company policies on mail address can mean that this generates addresses
- that the company's mail system will not accept, so you should ask for
- users' mail addresses when this matters. Furthermore, not all systems
- on which Perl runs are so forthcoming with this information as is Unix.
-
- The Mail::Util module from CPAN (part of the MailTools package) provides a
- mailaddress() function that tries to guess the mail address of the user.
- It makes a more intelligent guess than the code above, using information
- given when the module was installed, but it could still be incorrect.
- Again, the best way is often just to ask the user.
-
- =head2 How do I send mail?
-
- Use the C<sendmail> program directly:
-
- open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
- or die "Can't fork for sendmail: $!\n";
- print SENDMAIL <<"EOF";
- From: User Originating Mail <me\@host>
- To: Final Destination <you\@otherhost>
- Subject: A relevant subject line
-
- Body of the message goes here after the blank line
- in as many lines as you like.
- EOF
- close(SENDMAIL) or warn "sendmail didn't close nicely";
-
- The B<-oi> option prevents sendmail from interpreting a line consisting
- of a single dot as "end of message". The B<-t> option says to use the
- headers to decide who to send the message to, and B<-odq> says to put
- the message into the queue. This last option means your message won't
- be immediately delivered, so leave it out if you want immediate
- delivery.
-
- Alternate, less convenient approaches include calling mail (sometimes
- called mailx) directly or simply opening up port 25 have having an
- intimate conversation between just you and the remote SMTP daemon,
- probably sendmail.
-
- Or you might be able use the CPAN module Mail::Mailer:
-
- use Mail::Mailer;
-
- $mailer = Mail::Mailer->new();
- $mailer->open({ From => $from_address,
- To => $to_address,
- Subject => $subject,
- })
- or die "Can't open: $!\n";
- print $mailer $body;
- $mailer->close();
-
- The Mail::Internet module uses Net::SMTP which is less Unix-centric than
- Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
- are many reasons to use a mail transport agent like sendmail. These
- include queueing, MX records, and security.
-
- =head2 How do I read mail?
-
- While you could use the Mail::Folder module from CPAN (part of the
- MailFolder package) or the Mail::Internet module from CPAN (also part
- of the MailTools package), often a module is overkill, though. Here's a
- mail sorter.
-
- #!/usr/bin/perl
- # bysub1 - simple sort by subject
- my(@msgs, @sub);
- my $msgno = -1;
- $/ = ''; # paragraph reads
- while (<>) {
- if (/^From/m) {
- /^Subject:\s*(?:Re:\s*)*(.*)/mi;
- $sub[++$msgno] = lc($1) || '';
- }
- $msgs[$msgno] .= $_;
- }
- for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
- print $msgs[$i];
- }
-
- Or more succinctly,
-
- #!/usr/bin/perl -n00
- # bysub2 - awkish sort-by-subject
- BEGIN { $msgno = -1 }
- $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
- $msg[$msgno] .= $_;
- END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
-
- =head2 How do I find out my hostname/domainname/IP address?
-
- The normal way to find your own hostname is to call the C<`hostname`>
- program. While sometimes expedient, this has some problems, such as
- not knowing whether you've got the canonical name or not. It's one of
- those tradeoffs of convenience versus portability.
-
- The Sys::Hostname module (part of the standard perl distribution) will
- give you the hostname after which you can find out the IP address
- (assuming you have working DNS) with a gethostbyname() call.
-
- use Socket;
- use Sys::Hostname;
- my $host = hostname();
- my $addr = inet_ntoa(scalar gethostbyname($host || 'localhost'));
-
- Probably the simplest way to learn your DNS domain name is to grok
- it out of /etc/resolv.conf, at least under Unix. Of course, this
- assumes several things about your resolv.conf configuration, including
- that it exists.
-
- (We still need a good DNS domain name-learning method for non-Unix
- systems.)
-
- =head2 How do I fetch a news article or the active newsgroups?
-
- Use the Net::NNTP or News::NNTPClient modules, both available from CPAN.
- This can make tasks like fetching the newsgroup list as simple as:
-
- perl -MNews::NNTPClient
- -e 'print News::NNTPClient->new->list("newsgroups")'
-
- =head2 How do I fetch/put an FTP file?
-
- LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also
- available from CPAN) is more complex but can put as well as fetch.
-
- =head2 How can I do RPC in Perl?
-
- A DCE::RPC module is being developed (but is not yet available), and
- will be released as part of the DCE-Perl package (available from
- CPAN). The rpcgen suite, available from CPAN/authors/id/JAKE/, is
- an RPC stub generator and includes an RPC::ONC module.
-
- =head1 AUTHOR AND COPYRIGHT
-
- Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
- All rights reserved.
-
- When included as part of the Standard Version of Perl, or as part of
- its complete documentation whether printed or otherwise, this work
- may be distributed only under the terms of Perl's Artistic License.
- Any distribution of this file or derivatives thereof I<outside>
- of that package require that special arrangements be made with
- copyright holder.
-
- Irrespective of its distribution, all code examples in this file
- are hereby placed into the public domain. You are permitted and
- encouraged to use this code in your own programs for fun
- or for profit as you see fit. A simple comment in the code giving
- credit would be courteous but is not required.
-