Section 1: Basic Questions
This section aims to deal with basic questions, addressing the role and
nature of CGI, and its place in Web programming. Questions/answers which
just don't appear to 'fit' under any other section may also be included
here.
[ from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.htm ]
The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.
[Table of
Contents] [Index]
The distinction is semantic. Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts. In the context of CGI, the distinction has become
even more blurred than before. The words are often used interchangably
(including in this document). Current usage favours the word "scripts"
for CGI programs.
[Table of
Contents] [Index]
There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.
[Table of
Contents] [Index]
[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].
CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.
CGI is a protocol for running programs on a WWW server.
Typical applications include accessing a database, submitting
an order, or posting messages to a bulletin board.
JAVA is a programming language, and is an alternative to C, Perl, etc
rather than to CGI.
Java was designed for network applications, and its close association
with the Web stems from its ability to run programs safely on the
Client machine, and its adoption in browsers (especially Netscape).
In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.
[Table of
Contents] [Index]
CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference. Here are a few
guidelines:
1) CGI is a common standard agreed and supported by all major HTTPDs.
SSI is NOT a common standard, but an innovation of NCSA's HTTPD
which has been widely adopted in later servers. CGI has the
greatest portability, if this is an issue.
2) If your requirement is sufficiently simple that it can be done
by SSI without invoking an exec, then SSI will probably be
more efficient. A typical application would be to include
sitewide 'house styles', such as toolbars, netscapeised <body>
tags or embedded CSS stylesheets.
3) For more complex applications - like processing a form -
where you need to exec (run) a program in any case, CGI
is usually the best choice.
Many more recent variants on the theme of SSI are now available.
Probably the best-known are PHP which embeds server-side scripting
in a pre-html page, and ASP which is a Microsoft proprietary API
with the backing of MS marketing.
[Table of
Contents] [Index]
APIs are proprietary programming interfaces supported by particular
platforms. By using an API, you lose all portability. If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it. Otherwise stick to CGI.
[Table of
Contents] [Index]
Too many to enumerate - but I'll try and summarise. Briefly, there
are several decisions you have to make, including:
* Power. Is it up to a complex task?
* Complexity. How much programming manpower is it worth?
* Portability. Might you want to run your program on another system?
So here's an overview of the main options. It's inevitably subjective,
but may be helpful to someone:
Power Complexity Portability
Basic SSI: Low Low Medium
Enhanced SSI[2]: Medium Medium Low
CGI: High Medium High[4]
Enhanced CGI-like[5]: High Medium Medium[6]
Server API: v.High High Low
Servlets[7]: High High Medium
[2] For example, PHP, ASP.
[4] Subject to choice of programming language.
[5] For example: mod_perl or fastcgi, both of which can improve
efficiency with respect to standard CGI.
[6] You port them by converting to standard CGI
[7] Servlets are really a server API, and make sense if you're already
running a JAVA VM, but will probably hit your server hard if not.
[Table of
Contents] [Index]
If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
1) Installation notes for your HTTPD. Is it configured to run CGI
scripts, and if so how does it identify that a URL should be executed?
(Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
find it ask your server administrator).
2) The CGI specification at NCSA tells you all you need to know
to get your programs running as CGI applications.
http://hoohoo.ncsa.uiuc.edu/cgi/interface.htm
3) WWW Security FAQ. This is not required to 'get it working', but
is essential reading if you want to KEEP it working!
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.htm
If you're NOT already a programmer, you'll have to learn. If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI. Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.
[Table of
Contents] [Index]
Yes. Period.
There is a lot you can do to minimise these. The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.htm .
[Table of
Contents] [Index]
No, but it helps. The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix. At the time of writing, this is still the
most mature and best-supported platform for Web applications.
[Table of
Contents] [Index]
No - you can use any programming language you please. Perl is simply
today's most popular choice for CGI applications. Some other widely-
used languages are C, C++, TCL, BASIC and - for simple tasks -
even shell scripts.
Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.
[Table of
Contents] [Index]
It isn't really that important. Use what you're comfortable with,
or what you're constrained (eg by your manager) to use.
If you're just dabbling with programming, Perl is a good choice, simply
because of the wealth of ready-to-run Perl/CGI resources available.
If you're serious about programming, you should be at home in a
range of languages. C, the industry standard, is a must (at least to
the level of comfortably reading other people's code). You'll
certainly want at least one scripting language such as Perl, Python
or Tcl. C++ is also a very good idea.
In response to a Usenet newbie question:
> I am seriously wanting to learn some CGI programming languages
J.M. Ivler wrote some eloquent words of wisdom:
> If you want to learn a programming language, learn a programming language.
> If you want to learn how to do CGI programming, learn a programming
> language first.
>
> My book is one of the few that tackles two languages at the same time.
> Why? because it's not about languages (which are just syntax for logic).
> CGI programming is about programming, and how to leverage the experience
> for the person coming to the site, or maintaining the site, or in some way
> meeting some requirements. Language is just a tool to do so.
[Table of
Contents] [Index]
see next question
[Table of
Contents] [Index]
Maybe. It depends on your server installation.
These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.
If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs. As a last resort, ask the server administrator.
[Table of
Contents] [Index]
The CGI Overhead is a consequence of HTTP being a stateless protocol.
This means that a CGI process must be initialised for every "hit"
from a browser.
In the first instance, this usually means the server forking a
new process. This in itself is a very small overhead, but it
can become important on a heavily-used server if the number of
processes grows to problem levels. If the CGI programs are
themselves long-running, this is heavily exacerbated.
In the second place, the CGI program must initialise. In the
case of a compiled language such as C or C++ this is negligible,
but there is a penalty to pay for scripting languages such as Perl.
Thirdly, CGI is often used as 'glue' to a backend program, such as
a database, which may take some considerable time to initialise.
This represents a major overhead, which must be avoided in any
serious application. The most usual solution is for the backend
program to run as a separate server doing most of the work, while
the actual CGI simply carries messages.
Fourthly, some CGI scripts are just plain inefficient, and may
take hundreds of times the resources they need. Programs using
system() or `backtick` notation often fall into this category.
Note that there are ways to reduce or eliminate all these overheads,
but these tend to be system- or server-specific. The best-supported
server is probably Apache, as commercial server-vendors like to push
their proprietary solutions in preference to CGI.
[Table of
Contents] [Index]
[ quoted from http://www.umr.edu/~cgiwrap/intro.htm ]
> CGIWrap is a gateway program that allows general users to use CGI scripts
> and HTML forms without compromising the security of the http server.
> Scripts are run with the permissions of the user who owns the script. In
> addition, several security checks are performed on the script, which will not
> be executed if any checks fail.
>
> CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap
> is configured to run user scripts which are located in the
> ~/public_html/cgi-bin/ directory.
See http://www.umr.edu/~cgiwrap/
[Table of
Contents] [Index]
The normal format for data in HTTP requests is URLencoded. All Form data
is encoded in a string, of the form
param1=value1¶m2=value2&...paramn=valuen
Many non-alphanumeric characters are "escaped" in the encoding:
the character whose hexadecimal number is "XY" will be represented by
the character string "%XY".
Decoding this string is a fundamental function of every CGI library.
Another format is "multipart/form-data", also known as "file upload".
You will get this from the HTML markup
<form method="POST" enctype="multipart/form-data">
(but note you must accept URLencoded input in any case, since not all
browsers support multipart forms).
Most(?) CGI libraries will handle this transparently.
[Table of
Contents] [Index]