<p>We begin with an overview of the basic concepts behind the API, and how
they are manifested in the code.</p>
<h3><a name="HMR" id="HMR">Handlers, Modules, and Requests</a></h3>
<p>Apache breaks down request handling into a series of steps, more or
less the same way the Netscape server API does (although this API has a
few more stages than NetSite does, as hooks for stuff I thought might be
useful in the future). These are:</p>
<ul>
<li>URI -> Filename translation</li>
<li>Auth ID checking [is the user who they say they are?]</li>
<li>Auth access checking [is the user authorized <em>here</em>?]</li>
<li>Access checking other than auth</li>
<li>Determining MIME type of the object requested</li>
<li>`Fixups' -- there aren't any of these yet, but the phase is intended
as a hook for possible extensions like <code class="directive"><a href="../mod/mod_env.html#setenv">SetEnv</a></code>, which don't really fit well elsewhere.</li>
<li>Actually sending a response back to the client.</li>
<li>Logging the request</li>
</ul>
<p>These phases are handled by looking at each of a succession of
<em>modules</em>, looking to see if each of them has a handler for the
phase, and attempting invoking it if so. The handler can typically do one
of three things:</p>
<ul>
<li><em>Handle</em> the request, and indicate that it has done so by
returning the magic constant <code>OK</code>.</li>
<li><em>Decline</em> to handle the request, by returning the magic integer
constant <code>DECLINED</code>. In this case, the server behaves in all
respects as if the handler simply hadn't been there.</li>
<li>Signal an error, by returning one of the HTTP error codes. This
terminates normal handling of the request, although an ErrorDocument may
be invoked to try to mop up, and it will be logged in any case.</li>
</ul>
<p>Most phases are terminated by the first module that handles them;
however, for logging, `fixups', and non-access authentication checking,
all handlers always run (barring an error). Also, the response phase is
unique in that modules may declare multiple handlers for it, via a
dispatch table keyed on the MIME type of the requested object. Modules may
declare a response-phase handler which can handle <em>any</em> request,
by giving it the key <code>*/*</code> (<em>i.e.</em>, a wildcard MIME type
specification). However, wildcard handlers are only invoked if the server
has already tried and failed to find a more specific response handler for
the MIME type of the requested object (either none existed, or they all
declined).</p>
<p>The handlers themselves are functions of one argument (a
<code>request_rec</code> structure. vide infra), which returns an integer,
as above.</p>
<h3><a name="moduletour" id="moduletour">A brief tour of a module</a></h3>
<p>At this point, we need to explain the structure of a module. Our
candidate will be one of the messier ones, the CGI module -- this handles
both CGI scripts and the <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code> config file command. It's actually a great deal
more complicated than most modules, but if we're going to have only one
example, it might as well be the one with its fingers in every place.</p>
<p>Let's begin with handlers. In order to handle the CGI scripts, the
module declares a response handler for them. Because of <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>, it also has handlers for the
name translation phase (to recognize <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>ed URIs), the type-checking phase (any
<code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>ed request is typed
as a CGI script).</p>
<p>The module needs to maintain some per (virtual) server information,
namely, the <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>es in
effect; the module structure therefore contains pointers to a functions
which builds these structures, and to another which combines two of them
(in case the main server and a virtual server both have <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>es declared).</p>
<p>Finally, this module contains code to handle the <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code> command itself. This particular
module only declares one command, but there could be more, so modules have
<em>command tables</em> which declare their commands, and describe where
they are permitted, and how they are to be invoked.</p>
<p>A final note on the declared types of the arguments of some of these
commands: a <code>pool</code> is a pointer to a <em>resource pool</em>
structure; these are used by the server to keep track of the memory which
has been allocated, files opened, <em>etc.</em>, either to service a
particular request, or to handle the process of configuring itself. That
way, when the request is over (or, for the configuration pool, when the
server is restarting), the memory can be freed, and the files closed,
<em>en masse</em>, without anyone having to write explicit code to track
them all down and dispose of them. Also, a <code>cmd_parms</code>
structure contains various information about the config file being read,
and other status information, which is sometimes of use to the function
which processes a config-file command (such as <code class="directive"><a href="../mod/mod_alias.html#scriptalias">ScriptAlias</a></code>). With no further ado, the
module itself:</p>
<div class="example"><p><code>
/* Declarations of handlers. */<br />
<br />
int translate_scriptalias (request_rec *);<br />
int type_scriptalias (request_rec *);<br />
int cgi_handler (request_rec *);<br />
<br />
/* Subsidiary dispatch table for response-phase <br />
* handlers, by MIME type */<br />
<br />
handler_rec cgi_handlers[] = {<br />
<span class="indent">
{ "application/x-httpd-cgi", cgi_handler },<br />
{ NULL }<br />
</span>
};<br />
<br />
/* Declarations of routines to manipulate the <br />
* module's configuration info. Note that these are<br />
* returned, and passed in, as void *'s; the server<br />
* core keeps track of them, but it doesn't, and can't,<br />
<p>The sole argument to handlers is a <code>request_rec</code> structure.
This structure describes a particular request which has been made to the
server, on behalf of a client. In most cases, each connection to the
client generates only one <code>request_rec</code> structure.</p>
<h3><a name="req_tour" id="req_tour">A brief tour of the request_rec</a></h3>
<p>The <code>request_rec</code> contains pointers to a resource pool
which will be cleared when the server is finished handling the request;
to structures containing per-server and per-connection information, and
most importantly, information on the request itself.</p>
<p>The most important such information is a small set of character strings
describing attributes of the object being requested, including its URI,
filename, content-type and content-encoding (these being filled in by the
translation and type-check handlers which handle the request,
respectively).</p>
<p>Other commonly used data items are tables giving the MIME headers on
the client's original request, MIME headers to be sent back with the
response (which modules can add to at will), and environment variables for
any subprocesses which are spawned off in the course of servicing the
request. These tables are manipulated using the <code>ap_table_get</code>
and <code>ap_table_set</code> routines.</p>
<div class="note">
<p>Note that the <code>Content-type</code> header value <em>cannot</em>
be set by module content-handlers using the <code>ap_table_*()</code>
routines. Rather, it is set by pointing the <code>content_type</code>
field in the <code>request_rec</code> structure to an appropriate
string. <em>e.g.</em>,</p>
<div class="example"><p><code>
r->content_type = "text/html";
</code></p></div>
</div>
<p>Finally, there are pointers to two data structures which, in turn,
point to per-module configuration structures. Specifically, these hold
pointers to the data structures which the module has built to describe
the way it has been configured to operate in a given directory (via
<code>.htaccess</code> files or <code class="directive"><a href="../mod/core.html#directory"><Directory></a></code> sections), for private data it has built in the
course of servicing the request (so modules' handlers for one phase can
pass `notes' to their handlers for other phases). There is another such
configuration vector in the <code>server_rec</code> data structure pointed
to by the <code>request_rec</code>, which contains per (virtual) server
configuration data.</p>
<p>Here is an abridged declaration, giving the fields most commonly
used:</p>
<div class="example"><p><code>
struct request_rec {<br />
<br />
pool *pool;<br />
conn_rec *connection;<br />
server_rec *server;<br />
<br />
/* What object is being requested */<br />
<br />
char *uri;<br />
char *filename;<br />
char *path_info;
</code></p><pre>char *args; /* QUERY_ARGS, if any */
struct stat finfo; /* Set by server core;
* st_mode set to zero if no such file */</pre><p><code>
char *content_type;<br />
char *content_encoding;<br />
<br />
/* MIME header environments, in and out. Also, <br />
* an array containing environment variables to<br />
* be passed to subprocesses, so people can write<br />
* modules to add to that environment.<br />
*<br />
* The difference between headers_out and <br />
* err_headers_out is that the latter are printed <br />
* even on error, and persist across internal<br />
* redirects (so the headers printed for <br />
* <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> handlers will have
them).<br />
*/<br />
<br />
table *headers_in;<br />
table *headers_out;<br />
table *err_headers_out;<br />
table *subprocess_env;<br />
<br />
/* Info about the request itself... */<br />
<br />
</code></p><pre>int header_only; /* HEAD request, as opposed to GET */
char *protocol; /* Protocol, as given to us, or HTTP/0.9 */
char *method; /* GET, HEAD, POST, <em>etc.</em> */
int method_number; /* M_GET, M_POST, <em>etc.</em> */
</pre><p><code>
/* Info for logging */<br />
<br />
char *the_request;<br />
int bytes_sent;<br />
<br />
/* A flag which modules can set, to indicate that<br />
* the data being returned is volatile, and clients<br />
* should be told not to cache it.<br />
*/<br />
<br />
int no_cache;<br />
<br />
/* Various other config info which may change<br />
* with .htaccess files<br />
* These are config vectors, with one void*<br />
* pointer for each module (the thing pointed<br />
* to being the module's business).<br />
*/<br />
<br />
</code></p><pre>void *per_dir_config; /* Options set in config files, <em>etc.</em> */
void *request_config; /* Notes on *this* request */
</pre><p><code>
};
</code></p></div>
<h3><a name="req_orig" id="req_orig">Where request_rec structures come from</a></h3>
<p>Most <code>request_rec</code> structures are built by reading an HTTP
request from a client, and filling in the fields. However, there are a
few exceptions:</p>
<ul>
<li>If the request is to an imagemap, a type map (<em>i.e.</em>, a
<code>*.var</code> file), or a CGI script which returned a local
`Location:', then the resource which the user requested is going to be
ultimately located by some URI other than what the client originally
supplied. In this case, the server does an <em>internal redirect</em>,
constructing a new <code>request_rec</code> for the new URI, and
processing it almost exactly as if the client had requested the new URI
directly.</li>
<li>If some handler signaled an error, and an <code>ErrorDocument</code>
is in scope, the same internal redirect machinery comes into play.</li>
<li><p>Finally, a handler occasionally needs to investigate `what would
happen if' some other request were run. For instance, the directory
indexing module needs to know what MIME type would be assigned to a
request for each directory entry, in order to figure out what icon to
use.</p>
<p>Such handlers can construct a <em>sub-request</em>, using the
functions <code>ap_sub_req_lookup_file</code>,
<code>ap_sub_req_lookup_uri</code>, and <code>ap_sub_req_method_uri</code>;
these construct a new <code>request_rec</code> structure and processes it
as you would expect, up to but not including the point of actually sending
a response. (These functions skip over the access checks if the
sub-request is for a file in the same directory as the original
request).</p>
<p>(Server-side includes work by building sub-requests and then actually
invoking the response handler for them, via the function
<code>ap_run_sub_req</code>).</p>
</li>
</ul>
<h3><a name="req_return" id="req_return">Handling requests, declining, and returning
error codes</a></h3>
<p>As discussed above, each handler, when invoked to handle a particular
<code>request_rec</code>, has to return an <code>int</code> to indicate
what happened. That can either be</p>
<ul>
<li><code>OK</code> -- the request was handled successfully. This may or
may not terminate the phase.</li>
<li><code>DECLINED</code> -- no erroneous condition exists, but the module
declines to handle the phase; the server tries to find another.</li>
<li>an HTTP error code, which aborts handling of the request.</li>
</ul>
<p>Note that if the error code returned is <code>REDIRECT</code>, then
the module should put a <code>Location</code> in the request's
<code>headers_out</code>, to indicate where the client should be
redirected <em>to</em>.</p>
<h3><a name="resp_handlers" id="resp_handlers">Special considerations for response
handlers</a></h3>
<p>Handlers for most phases do their work by simply setting a few fields
in the <code>request_rec</code> structure (or, in the case of access
checkers, simply by returning the correct error code). However, response
handlers have to actually send a request back to the client.</p>
<p>They should begin by sending an HTTP response header, using the
function <code>ap_send_http_header</code>. (You don't have to do anything
special to skip sending the header for HTTP/0.9 requests; the function
figures out on its own that it shouldn't do anything). If the request is
marked <code>header_only</code>, that's all they should do; they should
return after that, without attempting any further output.</p>
<p>Otherwise, they should produce a request body which responds to the
client as appropriate. The primitives for this are <code>ap_rputc</code>
and <code>ap_rprintf</code>, for internally generated output, and
<code>ap_send_fd</code>, to copy the contents of some <code>FILE *</code>
straight to the client.</p>
<p>At this point, you should more or less understand the following piece
of code, which is the handler which handles <code>GET</code> requests
which have no more specific handler; it also shows how conditional
<code>GET</code>s can be handled, if it's desirable to do so in a
<h2><a name="config" id="config">Configuration, commands and the like</a></h2>
<p>One of the design goals for this server was to maintain external
compatibility with the NCSA 1.3 server --- that is, to read the same
configuration files, to process all the directives therein correctly, and
in general to be a drop-in replacement for NCSA. On the other hand, another
design goal was to move as much of the server's functionality into modules
which have as little as possible to do with the monolithic server core. The
only way to reconcile these goals is to move the handling of most commands
from the central server into the modules.</p>
<p>However, just giving the modules command tables is not enough to divorce
them completely from the server core. The server has to remember the
commands in order to act on them later. That involves maintaining data which
is private to the modules, and which can be either per-server, or
per-directory. Most things are per-directory, including in particular access
control and authorization information, but also information on how to
determine file types from suffixes, which can be modified by
<code class="directive"><a href="../mod/mod_mime.html#addtype">AddType</a></code> and <code class="directive"><a href="../mod/core.html#defaulttype">DefaultType</a></code> directives, and so forth. In general,
the governing philosophy is that anything which <em>can</em> be made
configurable by directory should be; per-server information is generally
used in the standard set of modules for information like
<code class="directive"><a href="../mod/mod_alias.html#alias">Alias</a></code>es and <code class="directive"><a href="../mod/mod_alias.html#redirect">Redirect</a></code>s which come into play before the
request is tied to a particular place in the underlying file system.</p>
<p>Another requirement for emulating the NCSA server is being able to handle
the per-directory configuration files, generally called
<code>.htaccess</code> files, though even in the NCSA server they can
contain directives which have nothing at all to do with access control.
Accordingly, after URI -> filename translation, but before performing any
other phase, the server walks down the directory hierarchy of the underlying
filesystem, following the translated pathname, to read any
<code>.htaccess</code> files which might be present. The information which
is read in then has to be <em>merged</em> with the applicable information
from the server's own config files (either from the <code class="directive"><a href="../mod/core.html#directory"><Directory></a></code> sections in
<code>access.conf</code>, or from defaults in <code>srm.conf</code>, which
actually behaves for most purposes almost exactly like <code><Directory
/></code>).</p>
<p>Finally, after having served a request which involved reading
<code>.htaccess</code> files, we need to discard the storage allocated for
handling them. That is solved the same way it is solved wherever else
similar problems come up, by tying those structures to the per-transaction
<p>Let's look out how all of this plays out in <code>mod_mime.c</code>,
which defines the file typing handler which emulates the NCSA server's
behavior of determining file types from suffixes. What we'll be looking
at, here, is the code which implements the <code class="directive"><a href="../mod/mod_mime.html#addtype">AddType</a></code> and <code class="directive"><a href="../mod/mod_mime.html#addencoding">AddEncoding</a></code> commands. These commands can appear in
<code>.htaccess</code> files, so they must be handled in the module's
private per-directory data, which in fact, consists of two separate
tables for MIME types and encoding information, and is declared as
table *encoding_types; /* Added with AddEncoding... */
} mime_dir_config;</pre></div>
<p>When the server is reading a configuration file, or <code class="directive"><a href="../mod/core.html#directory"><Directory></a></code> section, which includes
one of the MIME module's commands, it needs to create a
<code>mime_dir_config</code> structure, so those commands have something
to act on. It does this by invoking the function it finds in the module's
`create per-dir config slot', with two arguments: the name of the
directory to which this configuration information applies (or
<code>NULL</code> for <code>srm.conf</code>), and a pointer to a
resource pool in which the allocation should happen.</p>
<p>(If we are reading a <code>.htaccess</code> file, that resource pool
is the per-request resource pool for the request; otherwise it is a
resource pool which is used for configuration data, and cleared on
restarts. Either way, it is important for the structure being created to
vanish when the pool is cleared, by registering a cleanup on the pool if
necessary).</p>
<p>For the MIME module, the per-dir config creation function just
<code>ap_palloc</code>s the structure above, and a creates a couple of
<p>Now that we have these structures, we need to be able to figure out how
to fill them. That involves processing the actual <code class="directive"><a href="../mod/mod_mime.html#addtype">AddType</a></code> and <code class="directive"><a href="../mod/mod_mime.html#addencoding">AddEncoding</a></code> commands. To find commands, the server looks in
the module's command table. That table contains information on how many
arguments the commands take, and in what formats, where it is permitted,
and so forth. That information is sufficient to allow the server to invoke
most command-handling functions with pre-parsed arguments. Without further
ado, let's look at the <code class="directive"><a href="../mod/mod_mime.html#addtype">AddType</a></code>
command handler, which looks like this (the <code class="directive"><a href="../mod/mod_mime.html#addencoding">AddEncoding</a></code> command looks basically the same, and won't be