Sambar Server Documentation
|
CGI Tutorial |
HTML FORMs
Before diving into CGIs, you must understand HTML FORMs. If you've ever
filled out a series of fields in a browser and clicked on the "submit" button,
you've seen an HTML FORM. The data in the HTML FORMs typically provides
the input to server-side programs (i.e. CGIs); the CGIs take the HTML FORM
data and perform some action like placing an order with the vendor's
purchasing system, or sending mail to a company employee.
The following is a simple HTML FORM which executes a Perl script that displays the FORM contents.
<form method="post" action="/cgi-bin/dumpenv.pl">
Email: <input type="text" name="email" size="25">
Message: <textarea name="message" rows=3 cols=60>
</textarea>
<input type=submit value="Send message">
</form>
The HTML FORM action identifies the CGI program that will do something with the data from the form. In the above example, the CGI dumpenv.pl is the script that will receive the form data. The "method" tells the browser how to package the content when sending it to the WWW server. There are two basic methods: GET and POST. There is very little functional difference between these two methods; the significant differences are:
Inside a FORM, INPUT, SELECT, TEXTAREA tags are used to specify interface elements. Each INPUT field in a FORM must have parameters indicating the "type" (i.e. text for textual input fields) and "name" of the field. There are numerous INPUT attributes, including:
When the user clicks on the "submit" button, the browser sends all the data from the input fields to the program designated in the "action" line. Important: Every FORM must end with </form> so that the browser knows where the form ends.
Passing FORM data
When the user clicks on the "submit" button on a form, the browser
program links the name/value pairs of field data together into one long
buffer:
http://localhost/cgi-bin/dumpenv.pl?email=foobar&message=This+is+a+test
Note: The above URL would be displayed in the browser if the GET "method" was used (POST methods transport the data slightly differently, but the idea is the same.) The first portion of the URL indicates what server to send the request to: http://localhost. Localhost is a special term for the local machine. The next portion of the URL indicates the CGI script to execute: /cgi-bin/dumpenv.pl. Finally, the remainder of the script following the question mark (?) is the concatinated name/value for data in an encoded format.
The server receives the request and first attempts to find the /cgi-bin directory configured for the server. Next, it determines if and how to execute the script dumpenv.pl. Important: By default, many web servers do not permit CGI execution. WWW servers can be configured to recognize CGI programs in different ways. For some, any URL that calls for a file in a certain directory (often, "cgi-bin") indicates that the WWW server should try to run whatever it finds there as a CGI program. Others can be configured to use the file extension (the ".pl" or ".cgi") to indicate that certain files are programs rather than HTML pages, graphics, or other file types. You must understand how the server has been configured to execute CGI programs before you can proceed. For the remainder of this example, we assume that the web server is set up to recognize anything ending in .pl as a Perl CGI program and that there is a "cgi-bin" directory for script execution.
The browser appends a "?" onto the end of the URI in order to indicate that what follows is data for the program to use: http://localhost/cgi-bin/dumpenv.pl?. The WWW server then parses the URL and breaks the request into the URI, http://localhost/cgi-bin/dumpenv.pl, and the URI name/value pair arguments email=foobar&message=This+is+a+test. The question mark (?) designates the separation. Whatever you have a "name=" tag in the FORM becomes the name, and whatever is submitted for that field by the user becomes the value. Each name/value pair is separated in the URL line by the ampersand (&).
Parsing FORM data
The CGI program receives the name/value pair arguments in one long line
either via the QUERY_STRING environment variable or stdin.
The program is then required to split the name/value pairs up and decode
the strings for use.
For POST or PUT FORM data, the information will be sent to the CGI script via stdin. The server will send CONTENT_LENGTH bytes on this file descriptor. For example, the FORM sample above might send 35 bytes encoded as: email=foobar&message=This+is+a+test. In this case, the server will set the CONTENT_LENGTH environment variable to 35 and set the CONTENT_TYPE environment variable to application/x-www-form-urlencoded. The first byte on the CGI program's standard input will be "e", followed by the rest of the encoded string.
Fortunately, there are many packages available to decode CGI arguments into useable form. The CGI program sends its output to stdout. This output can either be a document generated by the program, or instructions to the server for retrieving the desired output. The following is a simple Perl script which takes HTML POST form input and displays the name/value pairs to the client:
#!/usr/local/perl/perl
print "CGI Variables\n";
# Get the FORM content-type and length
$content_type = $ENV{'CONTENT_TYPE'};
$content_len = $ENV{'CONTENT_LENGTH'};
# Buffer the POST content
binmode STDIN;
read(STDIN, $buffer, $content_len);
# Parse and display the FORM data.
if ((!$content_type) ||
($content_type eq 'application/x-www-form-urlencoded'))
{
# Process the name=value argument pairs
@args = split(/&/, $buffer);
$data = '';
foreach $pair (@args)
{
($name, $value) = split(/=/, $pair);
# Unescape the argument value
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
# Print the name=value pair
print "$name: $value\n";
}
}
else
{
print "Invalid content type (expecting POST data)!\n";
exit(1);
}
# DONE
exit(0);
Next, see if you can enhance the above script to accept and process FORM data passed via GET.
Environment Variables
As you can see in the above script, environment variables are used
to pass information about the FORM data to the CGI program.
The following is a list of some of the standard environment variables
available.
Environment Variable | Description |
---|---|
SERVER_SOFTWARE | is the name and version of the server answering the request. |
SERVER_NAME | is the server's hostname, DNS alias, or IP address as it would appear in self-referencing URLs. |
GATEWAY_INTERFACE | is the revision of the CGI sepcification to which the server complies. |
SERVER_PROTOCOL | is the name and revision of the protocol this request came in with. |
SERVER_PORT | specifies port to which the request was sent. |
REQUEST_METHOD | is the method with which the request was made: "GET", "POST" etc. |
QUERY_STRING | is defined as anything following the first '?' in the URL. Typically this data is the encoded results from your GET form. The string is encoded in the standard URL format changing spaces to +, and encoding special characters with %xx hexadecimal encoding. |
PATH_INFO | is the extra path information, as given by the client. |
PATH_TRANSLATED | is the translated version of PATH_INFO, which takes the path and does a virtual-to-physical maping to it. |
SCRIPT_NAME | is a virtual path to the script being executed. |
REMOTE_HOST | is the host name making the request. If DNS lookup is turned off, the REMOTE_ADDR is set and this variable is unset. |
REMOTE_ADDR | is IP address of the remote host making the request. |
CONTENT_LENGTH | is length of any attached information from an HTTP POST. |
CONTENT_TYPE | is the media type of the posted data (usually application/x-www-form-urlencoded). |
Returning Data
CGI programs can return content in many different document types
(i.e. text, images, audio). They can also return references to other
documents. To tell the server what kind of document you are sending
back, CGI requires you to place a short header on your output. This header
is ASCII text, consisting of lines separated by either linefeeds or
carriage returns (or both) followed by a single blank line. The output body
then follows in whatever native format.
If you begin your script output with either "HTTP/" then the server will send all output exactly as the script has written it to the client. Otherwise, the server will send a default header back (text/html file type) with any data returned from the script. Important: If you do not choose to write the entire HTTP header, you should not provide any special headers, as they will appear as part of the body after server processing.
If you begin your script with any of the following:
the server will append the appropriate HTTP response status (200 or 302) followed by the headers and content of your script exactly as received.
For example, to send back HTML to the client, your output should read:
Content-type: text/html <HTML><HEAD> <TITLE>output of HTML from CGI script</TITLE> </HEAD><BODY> <H1>Sample output</H1> Blah, blah, blah. </BODY></HTML>
In the above example, the response prepended is: HTTP/1.0 200 OK
To reference a file on another HTTP server, you would output something
like this:
Location: http://www.sambar.com/ Content-type: text/html <HTML><HEAD> <TITLE>Whoops...it moved</TITLE> </HEAD><BODY> <H1>Content Moved!</H1> </BODY></HTML>
In the above example, the response prepended is: HTTP/1.0 302 MOVED
Note: The Location: directive should come prior to the
Content-type: directive.
© 2000 Sambar Technologies. All rights reserved. Terms of Use.