home *** CD-ROM | disk | FTP | other *** search
- #
- # $Id: LWP.pm,v 1.1 1997/11/18 00:32:38 neeri Exp $
-
- package LWP;
-
- $VERSION = "5.13";
- sub Version { $VERSION; }
-
- require 5.004;
- require LWP::UserAgent; # this should load everything you need
-
- 1;
-
- __END__
-
- =head1 NAME
-
- LWP - Library for WWW access in Perl
-
- =head1 SYNOPSIS
-
- use LWP;
- print "This is libwww-perl-$LWP::VERSION\n";
-
-
- =head1 DESCRIPTION
-
- Libwww-perl is a collection of Perl modules which provides a simple
- and consistent programming interface (API) to the World-Wide Web. The
- main focus of the library is to provide classes and functions that
- allow you to write WWW clients, thus libwww-perl said to be a WWW
- client library. The library also contain modules that are of more
- general use.
-
- The main architecture of the library is object oriented. The user
- agent, requests sent and responses received from the WWW server are
- all represented by objects. This makes a simple and powerful
- interface to these services. The interface should be easy to extend
- and customize for your needs.
-
- The main features of the library are:
-
- =over 3
-
- =item *
-
- Contains various reusable components (modules) that can be
- used separately or together.
-
- =item *
-
- Provides an object oriented model of HTTP-style communication. Within
- this framework we currently support access to http, gopher, ftp, news,
- file, and mailto resources.
-
- =item *
-
- The library be used through the full object oriented interface or
- through a very simple procedural interface.
-
- =item *
-
- Support the basic and digest authorization schemes.
-
- =item *
-
- Transparent redirect handling.
-
- =item *
-
- Supports access through proxy servers.
-
- =item *
-
- URL handling (both absolute and relative URLs are supported).
-
- =item *
-
- A parser for F<robots.txt> files and a framework for constructing robots.
-
- =item *
-
- An experimental HTML parser and formatters (for PostScript and plain
- text).
-
- =item *
-
- The library can cooperate with Tk. A simple Tk-based GUI browser
- called 'tkweb' is distributed with the Tk extension for perl.
-
- =item *
-
- An implementation of the HTTP content negotiation algorithm that can
- be used both in protocol modules and in server scripts (like CGI
- scripts).
-
- =item *
-
- A simple command line client application called C<lwp-request>.
-
- =back
-
-
- =head1 HTTP STYLE COMMUNICATION
-
-
- The libwww-perl library is based on HTTP style communication. This
- section try to describe what that means.
-
- Let us start with this quote from the HTTP specification document
- <URL:http://www.w3.org/pub/WWW/Protocols/>:
-
- =over 3
-
- =item
-
- The HTTP protocol is based on a request/response paradigm. A client
- establishes a connection with a server and sends a request to the
- server in the form of a request method, URI, and protocol version,
- followed by a MIME-like message containing request modifiers, client
- information, and possible body content. The server responds with a
- status line, including the message's protocol version and a success or
- error code, followed by a MIME-like message containing server
- information, entity meta-information, and possible body content.
-
- =back
-
- What this means to libwww-perl is that communication always take place
- through these steps: First a I<request> object is created and
- configured. This object is then passed to a server and we get a
- I<response> object in return that we can examine. A request is always
- independent of any previous requests, i.e. the service is stateless.
- The same simple model is used for any kind of service we want to
- access.
-
- For example, if we want to fetch a document from a remote file server,
- then we send it a request that contains a name for that document and
- the response will contain the document itself. If we access a search
- engine, then the content of the request will contain the query
- parameters and the response will contain the query result. If we want
- to send a mail message to somebody then we send a request object which
- contains our message to the mail server and the response object will
- contain an acknowledgment that tells us that the message has been
- accepted and will be forwarded to the recipient(s).
-
- It is as simple as that!
-
-
- =head2 The Request Object
-
- The request object has the class name C<HTTP::Request> in
- libwww-perl. The fact that the class name use C<HTTP::> as a name
- prefix only implies that we use the HTTP model of communication. It
- does not limit the kind of services we can try to pass this I<request>
- to. For instance, we will send C<HTTP::Request>s both to ftp and
- gopher servers, as well as to the local file system.
-
- The main attributes of the request objects are:
-
- =over 3
-
- =item *
-
- The B<method> is a short string that tells what kind of
- request this is. The most used methods are B<GET>, B<PUT>,
- B<POST> and B<HEAD>.
-
- =item *
-
- The B<url> is a string denoting the protocol, server and
- the name of the "document" we want to access. The B<url> might
- also encode various other parameters.
-
- =item *
-
- The B<headers> contain additional information about the
- request and can also used to describe the content. The headers
- is a set of keyword/value pairs.
-
- =item *
-
- The B<content> is an arbitrary amount of data.
-
- =back
-
- =head2 The Response Object
-
- The request object has the class name C<HTTP::Response> in
- libwww-perl. The main attributes of objects of this class are:
-
- =over 3
-
- =item *
-
- The B<code> is a numerical value that encode the overall
- outcome of the request.
-
- =item *
-
- The B<message> is a short (human readable) string that
- corresponds to the I<code>.
-
- =item *
-
- The B<headers> contain additional information about the
- response and they also describe the content.
-
- =item *
-
- The B<content> is an arbitrary amount of data.
-
- =back
-
- Since we don't want to handle all possible I<code> values directly in
- our programs, the libwww-perl response object have methods that can be
- used to query what kind of response this is. The most commonly used
- response classification methods are:
-
- =over 3
-
- =item is_success()
-
- The request was was successfully received, understood or accepted.
-
- =item is_error()
-
- The request failed. The server or the resource might not be
- available, access to the resource might be denied or other things might
- have failed for some reason.
-
- =back
-
- =head2 The User Agent
-
- Let us assume that we have created a I<request> object. What do we
- actually do with it in order to receive a I<response>?
-
- The answer is that you pass it on to a I<user agent> object and this
- object will take care of all the things that need to be done
- (low-level communication and error handling). The user agent will give
- you back a I<response> object. The user agent represents your
- application on the network and it provides you with an interface that
- can accept I<requests> and will return I<responses>.
-
- You should think about the user agent as an interface layer between
- your application code and the network. Through this interface you are
- able to access the various servers on the network.
-
- The libwww-perl class name for the user agent is
- C<LWP::UserAgent>. Every libwww-perl application that wants to
- communicate should create at least one object of this kind. The main
- method provided by this object is request(). This method takes an
- C<HTTP::Request> object as argument and will (eventually) return a
- C<HTTP::Response> object.
-
- The user agent has many other attributes that lets you
- configure how it will interact with the network and with your
- application code.
-
- =over 3
-
- =item *
-
- The B<timeout> specify how much time we give remote servers in
- creating responses before the library disconnect and creates an
- internal I<timeout> response.
-
- =item *
-
- The B<agent> specify the name that your application should use when it
- presents itself on the network.
-
- =item *
-
- The B<from> attribute can be set to the e-mail address of the person
- responsible for running the application. If this is set, then the
- address will be sent to the servers with every request.
-
- =item *
-
- The B<use_alarm> specify if it is OK for the user agent to use the
- alarm(2) system to implement timeouts.
-
- =item *
-
- The B<use_eval> specify if the agent should raise an
- exception (C<die> in perl) if an error condition occur.
-
- =item *
-
- The B<parse_head> specify whether we should initialize response
- headers from the E<lt>head> section of HTML documents.
-
- =item *
-
- The B<proxy> and B<no_proxy> specify if and when communication should
- go through a proxy server. <URL:http://www.w3.org/pub/WWW/Proxies/>
-
- =item *
-
- The B<credentials> provide a way to set up user names and
- passwords that is needed to access certain services.
-
- =back
-
- Many applications would want even more control over how they interact
- with the network and they get this by specializing the
- C<LWP::UserAgent> by sub-classing. The library provide a
- specialization called C<LWP::RobotUA> that is used by robot
- applications.
-
- =head2 An Example
-
- This example shows how the user agent, a request and a response are
- represented in actual perl code:
-
- # Create a user agent object
- use LWP::UserAgent;
- $ua = new LWP::UserAgent;
- $ua->agent("AgentName/0.1 " . $ua->agent);
-
- # Create a request
- my $req = new HTTP::Request POST => 'http://www.perl.com/cgi-bin/BugGlimpse';
- $req->content_type('application/x-www-form-urlencoded');
- $req->content('match=www&errors=0');
-
- # Pass request to the user agent and get a response back
- my $res = $ua->request($req);
-
- # Check the outcome of the response
- if ($res->is_success) {
- print $res->content;
- } else {
- print "Bad luck this time\n";
- }
-
- The $ua is created once when the application starts up. New request
- objects are normally created for each request sent.
-
-
- =head1 NETWORK SUPPORT
-
- This section goes through the various protocol schemes and describe
- the HTTP style methods that are supported and the headers that might
- have any effect.
-
- For all requests, a "User-Agent" header is added and initialized from
- the $ua->agent value before the request is handed to the network
- layer. In the same way, a "From" header is initialized from the
- $ua->from value.
-
- For all responses, the library will add a header called "Client-Date".
- This header will encode the time when the response was received by
- your application. This format and semantics of the header is just
- like the server created "Date" header.
-
- =head2 HTTP Requests
-
- HTTP request are really just handed off to an HTTP server and it will
- decide what happens. Few servers implement methods beside the usual
- "GET", "HEAD", "POST" and "PUT" but CGI-scripts can really implement
- any method they like.
-
- If the server is not available then the library will generate an
- internal error response.
-
- The library automatically adds a "Host" and a "Content-Length" header
- to the HTTP request before it is sent over the network.
-
- For GET request you might want to add the "If-Modified-Since" header
- to make the request conditional.
-
- For POST request you should add the "Content-Type" header. When you
- try to emulate HTML E<lt>FORM> handling you should usually let the value
- of the "Content-Type" header be "application/x-www-form-urlencoded".
- See L<lwpcook> for examples of this.
-
- The libwww-perl HTTP implementation currently support the HTTP/1.0
- protocol. HTTP/0.9 servers are also handled correctly.
-
- The library allows you to access proxy server through HTTP. This
- means that you can set up the library to forward all types of request
- through the HTTP protocol module. See L<LWP::UserAgent> for
- documentation of this.
-
-
-
-
- =head2 FTP Requests
-
- The library currently support GET, HEAD and PUT requests. GET will
- retrieve a file or a directory listing from an FTP server. PUT will
- store a file on a ftp server.
-
- You can specify a ftp account for servers that want this in addition
- user name and password. This is specified by passing an "Account"
- header in the request.
-
- User name/password can be specified using basic authorization or be
- encoded in the URL. Bad logins return an UNAUTHORIZED response with
- "WWW-Authenticate: Basic" and can be treated as basic authorization
- for HTTP.
-
- The library support ftp ASCII transfer mode by specifying the "type=a"
- parameter in the URL.
-
- Directory listings are by default returned unprocessed (as returned
- from the ftp server) with the content media type reported to be
- "text/ftp-dir-listing". The C<File::Listing> module provide functionality
- for parsing of these directory listing.
-
- The ftp module is also able to convert directory listings to HTML and
- this can be requested via the standard HTTP content negotiation
- mechanisms (add an "Accept: text/html" header in the request if you
- want this).
-
- The normal file retrievals, the "Content-Type" is guessed based on the
- file name suffix. See L<LWP::MediaTypes>.
-
- The "If-Modified-Since" header is not honored yet.
-
- Example:
-
- $req = HTTP::Request->new(GET => 'ftp://me:passwd@ftp.some.where.com/');
- $req->header(Accept => "text/html, */*;q=0.1");
-
- =head2 News Requests
-
- Access to the USENET News system is implemented through the NNTP
- protocol. The name of the news server is obtained from the
- NNTP_SERVER environment variable and defaults to "news". It is not
- possible to specify the hostname of the NNTP server in the news:-URLs.
-
- The library support GET and HEAD to retrieve news articles through the
- NNTP protocol. You can also post articles to newsgroups by using
- (surprise!) the POST method.
-
- GET on newsgroups is not implemented yet.
-
- Examples:
-
- $req = HTTP::Request->new(GET => 'news:abc1234@a.sn.no');
-
- $req = HTTP::Request->new(POST => 'news:comp.lang.perl.test');
- $req->header(Subject => 'This is a test',
- From => 'me@some.where.org');
- $req->content(<<EOT);
- This is the content of the message that we are sending to
- the world.
- EOT
-
-
- =head2 Gopher Request
-
- The library supports the GET and HEAD method for gopher request. All
- request header values are ignored. HEAD cheats and will return a
- response without even talking to server.
-
- Gopher menus are always converted to HTML.
-
- The response "Content-Type" is generated from the document type
- encoded (as the first letter) in the request URL path itself.
-
- Example:
-
- $req = HTTP::Request->new('GET', 'gopher://gopher.sn.no/');
-
-
-
- =head2 File Request
-
- The library supports GET and HEAD methods for file requests. The
- "If-Modified-Since" header is supported. All other headers are
- ignored. The I<host> component of the file URL must be empty or set
- to "localhost". Any other I<host> value will be treated as an error.
-
- Directories are always converted to an HTML document. For normal
- files, the "Content-Type" and "Content-Encoding" in the response are
- guessed based on the file suffix.
-
- Example:
-
- $req = HTTP::Request->new(GET => 'file:/etc/passwd');
-
-
- =head2 Mailto Request
-
- You can send (aka "POST") mail messages using the library. All
- headers specified for the request are passed on to the mail system.
- The "To" header is initialized from the mail address in the URL.
-
- Example:
-
- $req = HTTP::Request->new(POST => 'mailto:libwww-perl-request@ics.uci.edu');
- $req->header("Subject", "subscribe");
- $req->content("Please subscribe me to the libwww-perl mailing list!\n");
-
-
- =head1 OVERVIEW OF CLASSES AND PACKAGES
-
- This table should give you a quick overview of the classes provided by the
- library. Indentation shows class inheritance.
-
- LWP::MemberMixin -- Access to member variables of Perl5 classes
- LWP::UserAgent -- WWW user agent class
- LWP::RobotUA -- When developing a robot applications
- LWP::Protocol -- Interface to various protocol schemes
- LWP::Protocol::http -- http:// access
- LWP::Protocol::file -- file:// access
- LWP::Protocol::ftp -- ftp:// access
- ...
-
- LWP::Socket -- Socket creation and IO
-
- HTTP::Headers -- MIME/RFC822 style header (used by HTTP::Message)
- HTTP::Message -- HTTP style message
- HTTP::Request -- HTTP request
- HTTP::Response -- HTTP response
- HTTP::Daemon -- A HTTP server class
-
- URI::URL -- Uniform Resource Locators
-
- WWW::RobotRules -- Parse robots.txt files
- WWW::RobotRules::AnyDBM_File -- Persistent RobotRules
-
- HTML::Parser -- Parse HTML documents
- HTML::TreeBuilder-- Build a HTML syntax tree
- HTML::HeadParser -- Parse the <HEAD> section of a HTML document
- HTML::LinkExtor -- Extract links from a HTML document
- HTML::Element -- Building block for the HTML::TreeBuilder
- HTML::Formatter -- Convert HTML syntax trees to readable formats
- HTML::FormatText -- Output is plain text
- HTML::FormatPS -- Output is PostScript
-
- The following modules provide various functions and definitions.
-
- LWP -- This file. Library version number and documentation.
- LWP::MediaTypes -- MIME types configuration (text/html etc.)
- LWP::Debug -- Debug logging module
- LWP::Simple -- Simplified procedural interface for common functions
- HTTP::Status -- HTTP status code (200 OK etc)
- HTTP::Date -- Date parsing module for HTTP date formats
- HTTP::Negotiate -- HTTP content negotiation calculation
- HTML::Entities -- Expand or unexpand entities in HTML text
- File::Listing -- Parse directory listings
-
- HTTP use the Base64 encoding at some places. The QuotedPrint module
- is just included to make the MIME:: collection more complete.
-
- MIME::Base64 -- Base64 encoding/decoding routines
- MIME::QuotedPrint -- Quoted Printable encoding/decoding routines
-
- The following modules does not have much to do with the World Wide
- Web, but are included just because I am lazy and did not bother to
- make separate distributions for them. Regard them as bonus, provided
- free for your pleasure.
-
- Font::AFM -- Parse Adobe Font Metric files
- File::CounterFile -- Persistent counter class
-
-
- =head1 MORE DOCUMENTATION
-
- All modules contain detailed information on the interfaces they
- provide. The L<lwpcook> is the libwww-perl cookbook that contain
- examples of typical usage of the library. You might want to take a
- look at how the scripts C<lwp-request>, C<lwp-rget> and C<lwp-mirror>
- are implemented.
-
- =head1 BUGS
-
- The library can not handle multiple simultaneous requests yet. The
- HTML:: modules are still experimental. Also, check out what's left in
- the TODO file.
-
- =head1 ACKNOWLEDGEMENTS
-
- This package owes a lot in motivation, design, and code, to the
- libwww-perl library for Perl 4, maintained by Roy Fielding
- E<lt>fielding@ics.uci.edu>.
-
- That package used work from Alberto Accomazzi, James Casey, Brooks
- Cutter, Martijn Koster, Oscar Nierstrasz, Mel Melchner, Gertjan van
- Oosten, Jared Rhine, Jack Shirazi, Gene Spafford, Marc VanHeyningen,
- Steven E. Brenner, Marion Hakanson, Waldemar Kebsch, Tony Sanders, and
- Larry Wall; see the libwww-perl-0.40 library for details.
-
- The primary architect for this Perl 5 library is Martijn Koster and
- Gisle Aas, with lots of help from Graham Barr, Tim Bunce, Andreas
- Koenig, Jared Rhine, and Jack Shirazi.
-
-
- =head1 COPYRIGHT
-
- Copyright 1995-1997, Gisle Aas
- Copyright 1995, Martijn Koster
-
- This library is free software; you can redistribute it and/or
- modify it under the same terms as Perl itself.
-
- =head1 AVAILABILITY
-
- The latest version of this library is likely to be available from:
-
- http://www.sn.no/libwww-perl/
-
- The best place to discuss this code is on the
- <libwww-perl@ics.uci.edu> mailing list.
-
- =cut
-