home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World 1999 August
/
PCWorld_1999-08_cd.bin
/
doc
/
HOWTO
/
WWW-HOWTO
< prev
next >
Wrap
Text File
|
1997-11-18
|
59KB
|
1,187 lines
Linux WWW HOWTO
by Wayne Leister, n3mtr@qis.net
v0.82, 19 November 1997
This document contains information about setting up WWW services under
Linux (both server and client). It tries not to be a in detail manual
but an overview and a good pointer to further information.
1. Introduction
Many people are trying Linux because they are looking for a really
good Internet capable operating system. Also, there are institutes,
universities, non-profits, and small businesses which want to set up
Internet sites on a small budget. This is where the WWW-HOWTO comes
in. This document explains how to set up clients and servers for the
largest part of the Internet - The World Wide Web.
All prices in this document are stated in US dollars. This document
assumes you are running Linux on an Intel platform. Instructions and
product availability my vary from platform to platform. There are
many links for downloading software in this document. Whenever
possible use a mirror site for faster downloading and to keep the load
down on the main server.
The US government forbids US companies from exporting encryption
stronger than 40 bit in strength. Therefore US companies will usually
have two versions of software. The import version will usually
support 128 bit, and the export only 40 bit. This applies to web
browsers and servers supporting secure transactions. Another name for
secure transactions is Secure Sockets Layer (SSL). We will refer to
it as SSL for the rest of this document.
1.1. Copyright
This document is Copyright (c) 1997 by Wayne Leister. The original
author of this document was Peter Dreuw.(All versions prior to 0.8)
This HOWTO is free documentation; you can redistribute it
and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later ver¡
sion.
This document is distributed in the hope that it will be
useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular pur¡
pose. See the GNU General Public License for more details.
You can obtain a copy of the GNU General Public License by
writing to the Free Software Foundation, Inc., 675 Mass Ave,
Cambridge, MA 02139, USA.
Trademarks are owned by there respective owners.
1.2. Feedback
Any feedback is welcome. I do not claim to be an expert. Some of
this information was taken from badly written web sites; there are
bound to be errors and omissions. But make sure you have the latest
version before you send corrections; It may be fixed in the next
version (see the next section for where to get the latest version).
Send feedback to n3mtr@qis.net.
1.3. New versions of this Document
New versions of this document can be retrieved in text format from
Sunsite at <http://sunsite.unc.edu/pub/Linux/docs/HOWTO/WWW-HOWTO> and
almost any Linux mirror site. You can view the latest HTML version on
the web at <http://sunsite.unc.edu/LDP/HOWTO/WWW-HOWTO.html>. There
are also HTML versions available on Sunsite in a tar archive.
2. Setting up WWW client software
The following chapter is dedicated to the setting up web browsers.
Please feel free to contact me, if your favorite web browser is not
mentioned here. In this version of the document only a few of the
browsers have there own section, but I tried to include all of them
(all I could find) in the overview section. In the future those
browsers that deserve there own section will have it.
The overview section is designed to help you decide which browser to
use, and give you basic information on each browser. The detail
section is designed to help you install, configure, and maintain the
browser.
Personally, I prefer the Netscape; it is the only browser that keeps
up with the latest things in HTML. For example, Frames, Java,
Javascript, style sheets, secure transactions, and layers. Nothing is
worse than trying to visit a web site and finding out that you can't
view it because your browser doesn't support some new feature.
However I use Lynx when I don't feel like firing up the X-
windows/Netscape monster.
2.1. Overview
``Navigator/Communicator''
Netscape Navigator is the only browser mentioned here, which is
capable of advanced HTML features. Some of these features are
frames, Java, Javascript, automatic update, and layers. It also
has news and mail capability. But it is a resource hog; it
takes up lots of CPU time and memory. It also sets up a
separate cache for each user wasting disk space. Netscape is a
commercial product. Companies have a 30 day trial period, but
there is no limit for individuals. I would encourage you to
register anyway to support Netscape in there efforts against
Microsoft (and what is a measly $40US). My guess is if
Microsoft wins, we will be forced to use MS Internet Explorer on
a Windows platform :(
``Lynx''
Lynx is the one of the smallest web browsers. It is the king of
text based browsers. It's free and the source code is available
under the GNU public license. It's text based, but it has many
special features.
Kfm
Kfm is part of the K Desktop Environment (KDE). KDE is a system
that runs on top of X-windows. It gives you many features like
drag an drop, sounds, a trashcan and a unified look and feel.
Kfm is the K File Manager, but it is also a web browser. Don't
be fooled by the name, for a young product it is very usable as
a web browser. It already supports frames, tables, ftp
downloads, looking into tar files, and more. The current
version of Kfm is 1.39, and it's free. Kfm can be used without
KDE, but you still need the librarys that come with KDE. For
more information about KDE and Kfm visit the KDE website at
<http://www.kde.org>.
``Emacs''
Emacs is the one program that does everything. It is a word
processor, news reader, mail reader, and web browser. It has a
steep learning curve at first, because you have to learn what
all the keys do. The X-windows version is easier to use,
because most of the functions are on menus. Another drawback is
that it's mostly text based. (It can display graphics if you are
running it under X-windows). It is also free, and the source
code is available under the GNU public license.
NCSA Mosaic
Mosaic is an X-windows browser developed by the National Center
for Supercomputing Applications (NCSA) at the University of
Illinois. NCSA spent four years on the project and has now
moved on to other things. The latest version is 2.6 which was
released on July 7, 1995. Source code is available for non-
commercial use. Spyglass Inc. <http://www.spyglass.com> has the
commercial rights to Mosaic. Its a solid X-windows browser, but
it lacks the new HTML features. For more info visit the NCSA
Mosaic home page at
<http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/>. The software
can be downloaded from
<ftp://ftp.ncsa.uiuc.edu/Mosaic/Unix/binaries/2.6/Mosaic-
linux-2.6.Z>.
Arena
Arena was a X-windows concept browser for the W3C (World Wide
Web Consortium) when they were testing HTML 3.0. Hence it
supports all the HTML 3.0 standards such as style sheets and
tables. Development was taken over by Yggdrasil Computing, with
the idea to turn it into a full fledge free X-windows browser.
However development has stopped in Feb 1997 with version 0.3.11.
Only part of the HTML 3.2 standard has been implemented. The
source code is released under the GNU public licence. For more
information see the web site at
<http://www.yggdrasil.com/Products/Arena/>. It can be
downloaded from <ftp://ftp.yggdrasil.com/pub/dist/web/arena/>.
Amaya
Amaya is the X-windows concept browser for the W3C for HTML 3.2.
Therefore it supports all the HTML 3.2 standards. It also
supports some of the features of HTML 4.0. It supports tables,
forms, client side image maps, put publishing, gifs, jpegs, and
png graphics. It is both a browser and authoring tool. The
latest public release is 1.0 beta. Version 1.1 beta is in
internal testing and is due out soon. For more information
visit the Amaya web site at <http://www.w3.org/Amaya/>. It can
be downloaded from <ftp://ftp.w3.org/pub/Amaya-LINUX-
ELF-1.0b.tar.gz>.
Red Baron
Red Baron is an X-windows browser made by Red Hat Software. It
is bundled with The Official Red Hat Linux distribution. I
could not find much information on it, but I know it supports
frames, forms and SSL. If you use Red Baron, please help me
fill in this section. For more information visit the Red Hat
website at <http://www.redhat.com>
Chimera
Chimera is a basic X-windows browser. It supports some of the
features of HTML 3.2. The latest release is 2.0 alpha 6
released August 27, 1997. For more information visit the
Chimera website at <http://www.unlv.edu/chimera/>. Chimera can
be downloaded from <ftp://ftp.cs.unlv.edu/pub/chimera-
alpha/chimera-2.0a6.tar.gz>.
Qweb
Qweb is yet another basic X-windows browser. It supports
tables, forms, and server site image maps. The latest version
is 1.3. For more information visit the Qweb website at
<http://sunsite.auc.dk/qweb/> The source is available from
<http://sunsite.auc.dk/qweb/qweb-1.3.tar.gz> The binaries are
available in a Red Hat RPM from
<http://sunsite.auc.dk/qweb/qweb-1.3-1.i386.rpm>
Grail
Grail is an X-windows browser developed by the Corporation for
National Research Initiatives (CNRI). Grail is written entirely
in Python, a interpreted object-oriented language. The latest
version is 0.3 released on May 7, 1997. It supports forms,
bookmarks, history, frames, tables, and many HTML 3.2 things.
Internet Explorer
There are rumors, that Microsoft is going to port the Internet
Explorer to various Unix platforms - maybe Linux. If its true
they are taking their time doing it. If you know something more
reliable, please drop me an e-mail.
In my humble opinion most of the above software is unusable for
serious web browsing. I'm not trying to discredit the authors, I know
they worked very hard on these projects. Just think, if all of these
people had worked together on one project, maybe we would have a free
browser that would rival Netscape and Internet Explorer.
In my opinion out of all of the broswers, Netscape and Lynx are the
best. The runners up would be Kfm, Emacs-W3 and Mosaic.
3. Lynx
Lynx is one of the smaller (around 600 K executable) and faster web
browsers available. It does not eat up much bandwidth nor system
resources as it only deals with text displays. It can display on any
console, terminal or xterm. You will not need an X Windows system or
additional system memory to run this little browser.
3.1. Where to get
Both the Red Hat and Slackware distributions have Lynx in them.
Therefore I will not bore you with the details of compiling and
installing Lynx.
The latest version is 2.7.1 and can be retrieved from
<http://www.slcc.edu/lynx/fote/> or from almost any friendly Linux FTP
server like ftp://sunsite.unc.edu under /pub/Linux/apps/www/broswers/
or mirror site.
For more information on Lynx try these locations:
Lynx Links
<http://www.crl.com/~subir/lynx.html>
Lynx Pages
<http://lynx.browser.org>
Lynx Help Pages
<http://www.crl.com/~subir/lynx/lynx_help/lynx_help_main.html>
(the same pages you get from lynx --help and typing ? in lynx)
Note: The Lynx help pages have recently moved. If you have an older
version of Lynx, you will need to change your lynx.cfg (in /usr/lib)
to point to the new address(above).
I think the most special feature of Lynx against all other web
browsers is the capability for batch mode retrieval. One can write a
shell script which retrieves a document, file or anything like that
via http, FTP, gopher, WAIS, NNTP or file:// - url's and save it to
disk. Furthermore, one can fill in data into HTML forms in batch mode
by simply redirecting the standard input and using the -post_data
option.
For more special features of Lynx just look at the help files and the
man pages. If you use a special feature of Lynx that you would like
to see added to this document, let me know.
4. Emacs-W3
There are several different flavors of Emacs. The two most popular
are GNU Emacs and XEmacs. GNU Emacs is put out by the Free Software
Foundation, and is the original Emacs. It is mainly geared toward
text based terminals, but it does run in X-Windows. XEmacs (formerly
Lucid Emacs) is a version that only runs on X-Windows. It has many
special features that are X-Windows related (better menus etc).
4.1. Where to get
Both the Red Hat and Slackware distributions include GNU Emacs.
The most recent GNU emacs is 19.34. It doesn't seem to have a web
site. The FTP site is at <ftp://ftp.gnu.ai.mit.edu/pub/gnu/>.
The latest version of XEmacs is 20.2. The XEmacs FTP site is at
<ftp://ftp.xemacs.org/pub/xemacs>. For more information about XEmacs
goto see its web page at <http://www.xemacs.org>.
Both are available from the Linux archives at ftp://sunsite.unc.edu
under /pub/Linux/apps/editors/emacs/
If you got GNU Emacs or XEmacs installed, you probably got the W3
browser running to.
The Emacs W3 mode is a nearly fully featured web browser system
written in the Emacs Lisp system. It mostly deals with text, but can
display graphics, too - at least - if you run the emacs under the X
Window system.
To get XEmacs in to W3 mode, goto the apps menu and select browse the
web.
I don't use Emacs, so if someone will explain how to get it into the
W3 mode I'll add it to this document. Most of this information was
from the original author. If any information is incorrect, please let
me know. Also let me know if you think anything else should be added
about Emacs.
5. Netscape Navigator/Communicator
5.1. Different versions and options.
Netscape Navigator is the King of WWW browsers. Netscape Navigator
can do almost everything. But on the other hand, it is one of the most
memory hungry and resource eating program I've ever seen.
There are 3 different versions of the program:
Netscape Navigator includes the web browser, netcaster (push client)
and a basic mail program.
Netscape Communicator includes the web browser, a web editor, an
advanced mail program, a news reader, netcaster (push client), and a
group conference utility.
Netscape Communicator Pro includes everything Communicator has plus a
group calendar, IBM terminal emulation, and remote administration
features (administrators can update thousands of copies of Netscape
from their desk).
In addition to the three versions there are two other options you must
pick.
The first is full install or base install. The full install includes
everything. The base install includes enough to get you started. You
can download the additional components as you need them (such as
multimedia support and netcaster). These components can be installed
by the Netscape smart update utility (after installing goto
help->software updates). At this time the full install is not
available for Linux.
The second option is import or export. If you are from the US are
Canada you have the option of selecting the import version. This
gives you the stronger 128 bit encryption for secure transactions
(SSL). The export version only has 40 bit encryption, and is the only
version allowed outside the US and Canada.
The latest version of the Netscape Navigator/Communicator/Communicator
Pro is 4.03. There are two different versions for Linux. One is for
the old 1.2 series kernels and one for the new 2.0 kernels. If you
don't have a 2.0 kernel I suggest you upgrade; there are many
improvements in the new kernel.
Beta versions are also available. If you try a beta version, they
usually expire in a month or so!
5.2. Where to get
The best way to get Netscape software is to go through their web site
at <http://www.netscape.com/download/>. They have menu's to guide you
through the selection. When it ask for the Linux version, it is
referring to the kernel (most people should be using 2.0 by now). If
your not sure which version kernel you have run 'cat /proc/version'.
Going through the web site is the only way to get the import versions.
If you want an export version you can download them directly from the
Netscape FTP servers. The FTP servers are also more up to date. For
example when I first wrote this the web interface did not have the
non-beta 4.03 for Linux yet, but it was on the FTP site. Here are the
links to the export Linux 2.0 versions:
Netscape Navigator 4.03 is at
<ftp://ftp.netscape.com/pub/communicator/4.03/shipping/english/unix/linux20/navigator_standalone/navigator-
v403-export.x86-unknown-linux2.0.tar.gz>
Netscape Communicator 4.03 for Linux 2.0 (kernel) is at
<ftp://ftp.netscape.com/pub/communicator/4.03/shipping/english/unix/linux20/base_install/communicator-
v403-export.x86-unknown-linux2.0.tar.gz>
Communicator Pro 4.03 for Linux was not available at the time I wrote
this.
These url's will change as new versions come out. If these links
break you can find them by fishing around at the FTP site
<ftp://ftp.netscape.com/pub/communicator/>.
These servers are heavily loaded at times. Its best to wait for off
peak hours or select a mirror site. Be prepared to wait, these
archives are large. Navigator is almost 8megs, and Communicator base
install is 10megs.
5.3. Installing
This section explains how to install version 4 of Netscape Navigator,
Communicator, and Communicator Pro.
First unpack the archive to a temporary directory. Then run the ns-
install script (type ./ns-install). Then make a symbolic link from
the /usr/local/netscape/netscape binary to /usr/local/bin/netscape
(type ln -s /usr/local/netscape/netscape /usr/local/bin/netscape).
Finally set the system wide environment variable $MOZILLA_HOME to
/usr/local/netscape so Netscape can find its files. If you are using
bash for your shell edit your /etc/profile and add the lines:
MOZILLA_HOME="/usr/local/netscape"
export MOZILLA_HOME
After you have it installed the software can automatically update
itself with smart update. Just run Netscape as root and goto
help->software updates. If you only got the base install, you can
also install the Netscape components from there.
Note: This will not remove any old versions of Netscape, you must
manually remove them by deleting the Netscape binary and Java class
file (for version 3).
6. Setting up WWW server systems
This section contains information on different http server software
packages and additional server side tools like script languages for
CGI programs etc. There are several dozen web servers, I only covered
those that are fully functional. As some of these are commercial
products, I have no way of trying them. Most of the information in
the overview section was pieced together from various web sites. If
there is any incorrect or missing information please let me know.
For a technical description on the http mechanism, take a look at the
RFC documents mentioned in the chapter "For further reading" of this
HOWTO.
I prefer to use the Apache server. It has almost all the features you
would ever need and its free! I will admit that this section is
heavily biased toward Apache. I decided to concentrate my efforts on
the Apache section rather than spread it out over all the web servers.
I may cover other web servers in the future.
6.1. Overview
Cern httpd
This was the first web server. It was developed by the European
Laboratory for Particle Physics (CERN). CERN httpd is no longer
supported. The CERN httpd server is reported to have some ugly
bugs, to be quite slow and resource hungry. The latest version
is 3.0. For more information visit the CERN httpd home page at
<http://www.w3.org/Daemon/Status.html>. It is available for
download at
<ftp://sunsite.unc.edu/pub/Linux/apps/www/servers/httpd-3.0.term.tpz>
(no it is not a typo, the extension is actually .tpz on the
site; probably should be .tgz)
NCSA HTTPd
The NCSA HTTPd server is the father to Apache (The development
split into two different servers). Therefore the setup files
are very similar. NCSA HTTPd is free and the source code is
available. This server not covered in this document, although
reading the Apache section may give you some help. The NCSA
server was once popular, but most people are replacing it with
Apache. Apache is a drop in replacement for the NCSA
server(same configuration files), and it fixes several
shortcomings of the NCSA server. NCSA HTTPd accounts for 4.9%
(and falling) of all web servers. (source September 1997
Netcraft survey <http://www.netcraft.com/survey/>). The latest
version is 1.5.2a. For more information see the NCSA website at
<http://hoohoo.ncsa.uiuc.edu>.
``Apache''
Apache is the king of all web servers. Apache and its source
code is free. Apache is modular, therefore it is easy to add
features. Apache is very flexible and has many, many features.
Apache and its derivatives makes up 44% of all web domains (50%
if you count all the derivatives). There are over 695,000
Apache servers in operation (source November 1997 Netcraft
survey <http://www.netcraft.com/survey/>).
The official Apache is missing SSL, but there are two
derivatives that fill the gap. Stronghold is a commercial
product that is based on Apache. It retails for $995; an
economy version is available for $495 (based on an old version
of Apache). Stronghold is the number two secure server behind
Netscape (source C2 net <http://www.c2.net/products/stronghold>
and Netcraft survey <http://www.netcraft.com/survey/>). For
more information visit the Stronghold website at
<http://www.c2.net/products/stronghold/>. It was developed
outside the US, so it is available with 128 bit SSL everywhere.
Apache-SSL is a free implementation of SSL, but it is not for
commercial use in the US (RSA has US patents on SSL technology).
It can be used for non-commercial use in the US if you link with
the free RSAREF library. For more information see the website
at <http://www.algroup.co.uk/Apache-SSL/>.
Netscape Fast Track Server
Fast Track was developed by Netscape, but the Linux version is
put out by Caldera. The Caldera site lists it as Fast Track for
OpenLinux. I'm not sure if it only runs on Caldera OpenLinux or
if any Linux distribution will do (E-mail me if you have the
answer). Netscape servers account for 11.5% (and falling) of
all web servers (source September 1997
<http://www.netcraft.com/survey/>). The server sells for $295.
It is also included with the Caldera OpenLinux Standard
distribution which sells for $399 ($199.50 educational). The
web pages tell of a nice administration interface and a quick 10
minute setup. The server has support for 40-bit SSL. To get
the full 128-bit SSL you need Netscape Enterprise Server.
Unfortunately that is not available for Linux :( The latest
version available for Linux is 2.0 (Version 3 is in beta, but
its not available for Linux yet). To buy a copy goto the
Caldera web site at
<http://www.caldera.com/products/netscape/netscape.html> For
more information goto the Fast Track page at
<http://www.netscape.com/comprod/server_central/product/fast_track/>
WN WN has many features that make it attractive. First it is
smaller than the CERN, NCSA HTTPd, an Apache servers. It also
has many built-in features that would require CGI's. For
example site searches, enhanced server side includes. It can
also decompress/compress files on the fly with its filter
feature. It also has the ability to retrieve only part of a
file with its ranges feature. It is released under the GNU
public license. The current version is 1.18.3. For more
information see the WN website at <http://hopf.math.nwu.edu/>.
AOLserver
AOLserver is made by America Online. I'll admit that I was
surprised by the features of a web server coming from AOL. In
addition to the standard features it supports database
connectivity. Pages can query a database by Structured Query
Language (SQL) commands. The database is access through Open
Database Connectivity (ODBC). It also has built-in search
engine and TCL scripting. If that is not enough you can add
your own modules through the c Application Programming Interface
(API). I almost forgot to mention support for 40 bit SSL. And
you get all this for free! For more information visit the
AOLserver site at <http://www.aolserver.com/server/>
Zeus Server
Zeus Server was developed by Zeus Technology. They claim that
they are the fastest web server (using WebSpec96 benchmark).
The server can be configured and controlled from a web browser!
It can limit processor and memory resources for CGI's, and it
executes them in a secure environment (whatever that means). It
also supports unlimited virtual servers. It sells for $999 for
the standard version. If you want the secure server (SSL) the
price jumps to $1699. They are based outside the US so 128 bit
SSL is available everywhere. For more information visit the
Zeus Technology website at <http://www.zeus.co.uk>. The US
website is at <http://www.zeus.com>. I'll warn you they are
cocky about the fastest web server thing. But they don't even
show up under top web servers in the Netcraft Surveys.
CL-HTTP
CL-HTTP stands for Common Lisp Hypermedia Server. If you are a
Lisp programmer this server is for you. You can write your CGI
scripts in Lisp. It has a web based setup function. It also
supports all the standard server features. CL-HTTP is free and
the source code is available. For more information visit the
CL-HTTP website at <http://www.ai.mit.edu/projects/iiip/doc/cl-
http/home-page.html> (could they make that url any longer?).
If you have a commercial purpose (company web site, or ISP), I would
strongly recommend that you use Apache. If you are looking for easy
setup at the expense of advanced features then the Zeus Server wins
hands down. I've also heard that the Netscape Server is easy to
setup. If you have an internal use you can be a bit more flexible.
But unless one of them has a feature that you just have to use, I
would still recommend using one of the three above.
This is only a partial listing of all the servers available. For a
more complete list visit Netcraft at
<http://www.netcraft.com/survey/servers.html> or Web Compare at
<http://webcompare.internet.com>.
7. Apache
The current version of Apache is 1.2.4. Version 1.3 is in beta
testing. The main Apache site is at <http://www.apache.org/>.
Another good source of information is Apacheweek at
<http://www.apacheweek.com/>. The Apache documentation is ok, so I'm
not going to go into detail in setting up apache. The documentation
is on the website and is included with the source (in HTML format).
There are also text files included with the source, but the HTML
version is better. The documentation should get a whole lot better
once the Apache Documentation Project gets under way. Right now most
of the documents are written by the developers. Not to discredit the
developers, but they are a little hard to understand if you don't know
the terminology.
7.1. Where to get
Apache is included in the Red Hat, Slackware, and OpenLinux
distributions. Although they may not be the latest version, they are
very reliable binaries. The bad news is you will have to live with
their directory choices (which are totally different from each other
and the Apache defaults).
The source is available from the Apache web site at
<http://www.apache.org/dist/> Binaries are are also available at
apache at the same place. You can also get binaries from sunsite at
<ftp://sunsite.unc.edu/pub/Linux/apps/www/servers/>. And for those of
us running Red Hat the latest binary RPM file can usually be found in
the contrib directory at <ftp://ftp.redhat.com/pub/contrib/i386/>
If your server is going to be used for commercial purposes, it is
highly recommended that you get the source from the Apache website and
compile it yourself. The other option is to use a binary that comes
with a major distribution. For example Slackware, Red Hat, or
OpenLinux distributions. The main reason for this is security. An
unknown binary could have a back door for hackers, or an unstable
patch that could crash your system. This also gives you more control
over what modules are compiled in, and allows you to set the default
directories. It's not that difficult to compile Apache, and besides
you not a real Linux user until you compile your own programs ;)
7.2. Compiling and Installing
First untar the archive to a temporary directory. Next change to the
src directory. Then edit the Configuration file if you want to
include any special modules. The most commonly used modules are
already included. There is no need to change the rules or makefile
stuff for Linux. Next run the Configure shell script (./Configure).
Make sure it says Linux platform and gcc as the compiler. Next you
may want to edit the httpd.h file to change the default directories.
The server home (where the config files are kept) default is
/usr/local/etc/httpd/, but you may want to change it to just
/etc/httpd/. And the server root (where the HTML pages are served
from) default is /usr/local/etc/httpd/htdocs/, but I like the
directory /home/httpd/html (the Red Hat default for Apache). If you
are going to be using su-exec (see special features below) you may
want to change that directory too. The server root can also be
changed from the config files too. But it is also good to compile it
in, just encase Apache can't find or read the config file. Everything
else should be changed from the config files. Finally run make to
compile Apache.
If you run in to problems with include files missing, check the
following things. Make sure you have the kernel headers (include
files) installed for your kernel version. Also make sure you have
these symbolic links in place:
/usr/include/linux should be a link to /usr/src/linux/include/linux
/usr/include/asm should be a link to /usr/src/linux/include/asm
/usr/src/linux should be a link to the Linux source directory (ex.linux-2.0.30)
Links can be made with ln -s, it works just like the cp command except
it makes a link (ln -s source-dir destination-link)
When make is finished there should be an executable named httpd in the
directory. This needs to be moved in to a bin directory. /usr/sbin
or /usr/local/sbin would be good choices.
Copy the conf, logs, and icons sub-directories from the source to the
server home directory. Next rename 3 of the files files in the conf
sub-directory to get rid of the -dist extension (ex. httpd.conf-dist
becomes httpd.conf)
There are also several support programs that are included with Apache.
They are in the support directory and must be compiled and installed
separately. Most of them can be make by using the makefile in that
directory (which is made when you run the main Configure script). You
don't need any of them to run Apache, but some of them make the
administrators job easier.
7.3. Configuring
Now you should have four files in your conf sub-directory (under your
server home directory). The httpd.conf sets up the server daemon
(port number, user, etc). The srm.conf sets the root document tree,
special handlers, etc. The access.conf sets the base case for access.
Finally mime.types tells the server what mime type to send to the
browser for each extension.
The configuration files are pretty much self-documented (plenty of
comments), as long as you understand the lingo. You should read
through them thoroughly before putting your server to work. Each
configuration item is covered in the Apache documentation.
The mime.types file is not really a configuration file. It is used by
the server to translate file extensions into mime-types to send to the
browser. Most of the common mime-types are already in the file. Most
people should not need to edit this file. As time goes on, more mime
types will be added to support new programs. The best thing to do is
get a new mime-types file (and maybe a new version of the server) at
that time.
Always remember when you change the configuration files you need to
restart Apache or send it the SIGHUP signal with kill for the changes
to take effect. Make sure you send the signal to the parent process
and not any of the child processes. The parent usually has the lowest
process id number. The process id of the parent is also in the
httpd.pid file in the log directory. If you accidently send it to one
of the child processes the child will die and the parent will restart
it.
I will not be walking you through the steps of configuring Apache.
Instead I will deal with specific issues, choices to be made, and
special features.
I highly recommend that all users read through the security tips in
the Apache documentation. It is also available from the Apache
website at <http://www.apache.org/docs/mics/security_tips.html>.
7.4. Hosting virtual websites
Virtual Hosting is when one computer has more than one domain name.
The old way was to have each virtual host have its own IP address.
The new way uses only one IP address, but it doesn't work correctly
with browsers that don't support HTTP 1.1.
My recommendation for businesses is to go with the IP based virtual
hosting until most people have browsers that support HTTP 1.1 (give it
a year or two). This also gives you a more complete illusion of
virtual hosting. While both methods can give you virtual mail
capabilities (can someone confirm this?), only IP based virtual
hosting can also give you virtual FTP as well.
If it is for a club or personal page, you may want to consider shared
IP virtual hosting. It should be cheaper than IP based hosting and
you will be saving precious IP addresses.
You can also mix and match IP and shared IP virtual hosts on the same
server. For more information on virtual hosting visit Apacheweek at
<http://www.apacheweek.com/features/vhost>.
7.4.1. IP based virtual hosting
In this method each virtual host has its own IP address. By
determining the IP address that the request was sent to, Apache and
other programs can tell what domain to serve. This is an incredible
waste of IP space. Take for example the servers where my virtual
domain is kept. They have over 35,000 virtual accounts, that means
35,000 IP addresses. Yet I believe at last count they had less than
50 servers running.
Setting this up is a two part process. The first is getting Linux
setup to accept more than one IP address. The second is setting up
apache to serve the virtual hosts.
The first step in setting up Linux to accept multiple IP addresses is
to make a new kernel. This works best with a 2.0 series kernel (or
higher). You need to include IP networking and IP aliasing support.
If you need help with compiling the kernel see the kernel howto
<http://sunsite.unc.edu/LDP/HOWTO/Kernel-HOWTO.html>.
Next you need to setup each interface at boot. If you are using the
Red Hat Distribution then this can be done from the control panel.
Start X-windows as root, you should see a control panel. Then double
click on network configuration. Next goto the interfaces panel and
select your network card. Then click alias at the bottom of the
screen. Fill in the information and click done. This will need to be
done for each virtual host/IP address.
If you are using other distributions you may have to do it manually.
You can just put the commands in the rc.local file in /etc/rc.d
(really they should go in with the networking stuff). You need to
have a ifconfig and route command for each device. The aliased
addresses are given a sub device of the main one. For example eth0
would have aliases eth0:0, eth0:1, eth0:2, etc. Here is an example of
configuring a aliased device:
ifconfig eth0:0 192.168.1.57
route add -host 192.168.1.57 dev eth0:0
You can also add a broadcast address and a netmask to the ifconfig
command. If you have alot of aliases you may want to make a for loop
to make it easier. For more information see the IP alias mini howto
<http://sunsite.unc.edu/LDP/HOWTO/mini/IP-Alias.html>.
Then you need to setup your domain name server (DNS) to serve these
new domains. And if you don't already own the domain names, you need
to contact the Internic <http://www.internic.net> to register the
domain names. See the DNS-howto for information on setting up your
DNS.
Finally you need to setup Apache to server the virtual domain
correctly. This is in the httpd.conf configuration file near the end.
They give you an example to go by. All commands specific to that
virtual host are put in between the virtualhost directive tags. You
can put almost any command in there. Usually you set up a different
document root, script directory, and log files. You can have almost
unlimited number of virtual hosts by adding more virtualhost directive
tags.
In rare cases you may need to run separate servers if a directive is
needed for a virtual host, but is not allowed in the virtual host
tags. This is done using the bindaddress directive. Each server
will have a different name and setup files. Each server only responds
to one IP address, specified by the bindaddress directive. This is an
incredible waste of system resources.
7.4.2. Shared IP virtual hosting
This is a new way to do virtual hosting. It uses a single IP address,
thus conserving IP addresses for real machines (not virtual ones). In
the same example used above those 30,000 virtual hosts would only take
50 IP addresses (one for each machine). This is done by using the new
HTTP 1.1 protocol. The browser tells the server which site it wants
when it sends the request. The problem is browsers that don't support
HTTP 1.1 will get the servers main page, which could be setup to
provide a menu of virtual hosts available. That ruins the whole
illusion of virtual hosting. The illusion that you have your own
server.
The setup is much simpler than the IP based virtual hosting. You
still need to get your domain from the Internic and setup your DNS.
This time the DNS points to the same IP address as the original
domain. Then Apache is setup the same as before. Since you are using
the same IP address in the virtualhost tags, it knows you want Shared
IP virtual hosting.
There are several work arounds for older browsers. I'll explain the
best one. First you need to make your main pages a virtual host
(either IP based or shared IP). This frees up the main page for a
link list to all your virtual hosts. Next you need to make a back
door for the old browsers to get in. This is done using the
ServerPath directive for each virtual host inside the virtualhost
directive. For example by adding ServerPath /mysite/ to
www.mysite.com old browsers would be able to access the site by
www.mysite.com/mysite/. Then you put the default page on the main
server that politely tells them to get a new browser, and lists links
to all the back doors of all the sites you host on that machine. When
an old browser accesses the site they will be sent to the main page,
and get a link to the correct page. New browsers will never see the
main page and will go directly to the virtual hosts. You must
remember to keep all of your links relative within the web sites,
because the pages will be accessed from two different URL's
(www.mysite.com and www.mysite.com/mysite/).
I hope I didn't lose you there, but its not an easy workaround. Maybe
you should consider IP based hosting after all. A very similar
workaround is also explained on the apache website at
<http://www.apache.org/manual/host.html>.
If anyone has a great resource for Shared IP hosting, I would like to
know about it. It would be nice to know what percent of browsers out
there support HTTP 1.1, and to have a list of which browsers and
versions support HTTP 1.1.
7.5. CGI scripts
There are two different ways to give your users CGI script capability.
The first is make everything ending in .cgi a CGI script. The second
is to make script directories (usually named cgi-bin). You could also
use both methods. For either method to work the scripts must be world
executable (chmod 711). By giving your users script access you are
creating a big security risk. Be sure to do your homework to minimize
the security risk.
I prefer the first method, especially for complex scripting. It
allows you to put scripts in any directory. I like to put my scripts
with the web pages they work with. For sites with allot of scripts it
looks much better than having a directory full of scripts. This is
simple to setup. First uncomment the .cgi handler at the end of the
srm.conf file. Then make sure all your directories have the option
ExecCGI or All in the access.conf file.
Making script directories is considered more secure. To make a script
directory you use the ScriptAlias directive in the srm.conf file. The
first argument is the Alias the second is the actual directory. For
example ScriptAlias /cgi-bin/ /usr/httpd/cgi-bin/ would make
/usr/httpd/cgi-bin able to execute scripts. That directory would be
used whenever someone asked for the directory /cgi-bin/. For security
reasons you should also change the properties of the directory to
Options none, AllowOveride none in the access.conf (just uncomment the
example that is there). Also do not make your script directories
subdirectories of your web page directories. For example if you are
serving pages from /home/httpd/html/, don't make the script directory
/home/httpd/html/cgi-bin; Instead make it /home/httpd/cgi-bin.
If you want your users to have there own script directories you can
use multiple ScriptAlias commands. Virtual hosts should have there
ScriptAlias command inside the virtualhost directive tags. Does
anyone know a simple way to allow all users to have a cgi-bin
directory without individual ScriptAlias commands?
7.6. Users Web Directories
There are two different ways to handle user web directories. The
first is to have a subdirectory under the users home directory
(usually public_html). The second is to have an entirely different
directory tree for web directories. With both methods make sure set
the access options for these directories in the access.conf file.
The first method is already setup in apache by default. Whenever a
request for /~bob/ comes in it looks for the public_html directory in
bob's home directory. You can change the directory with the UserDir
directive in the srm.conf file. This directory must be world readable
and executable. This method creates a security risk because for
Apache to access the directory the users home directory must be world
executable.
The second method is easy to setup. You just need to change the
UserDir directive in the srm.conf file. It has many different
formats; you may want to consult the Apache documentation for
clarification. If you want each user to have their own directory
under /home/httpd/, you would use UserDir /home/httpd. Then when the
request /~bob/ comes in it would translate to /home/httpd/bob/. Or if
you want to have a subdirectory under bob's directory you would use
UserDir /home/httpd/*/html. This would translate to
/home/httpd/bob/html/ and would allow you to have a script directory
too (for example /home/httpd/bob/cgi-bin/).
7.7. Daemon mode vs. Inetd mode
There are two ways that apache can be run. One is as a daemon that is
always running (Apache calls this standalone). The second is from the
inetd super-server.
Daemon mode is far superior to inetd mode. Apache is setup for daemon
mode by default. The only reason to use the inetd mode is for very
low use applications. Such as internal testing of scripts, small
company Intranet, etc. Inetd mode will save memory because apache
will be loaded as needed. Only the inetd daemon will remain in
memory.
If you don't use apache that often you may just want to keep it in
daemon mode and just start it when you need it. Then you can kill it
when you are done (be sure to kill the parent and not one of the child
processes).
To setup inetd mode you need to edit a few files. First in
/etc/services see if http is already in there. If its not then add
it:
http 80/tcp
Right after 79 (finger) would be a good place. Then you need to edit
the /etc/inetd.conf file and add the line for Apache:
http stream tcp nowait root /usr/sbin/httpd httpd
Be sure to change the path if you have Apache in a different location.
And the second httpd is not a typo; the inet daemon requires that. If
you are not currently using the inet daemon, you may want to comment
out the rest of the lines in the file so you don't activate other ser¡
vices as well (FTP, finger, telnet, and many other things are usually
run from this daemon).
If you are already running the inet deamon (inetd), then you only need
to send it the SIGHUP signal (via kill; see kill's man page for more
info) or reboot the computer for changes to take effect. If you are
not running inetd then you can start it manually. You should also add
it to your init files so it is loaded at boot (the rc.local file may
be a good choice).
7.8. Allowing put and delete commands
The newer web publishing tools support this new method of uploading
web pages by http (instead of FTP). Some of these products don't even
support FTP anymore! Apache does support this, but it is lacking a
script to handle the requests. This script could be a big security
hole, be sure you know what you are doing before attempting to write
or install one.
If anyone knows of a script that works let me know and I'll include
the address to it here.
For more information goto Apacheweek's article at
<http://www.apacheweek.com/features/put>.
7.9. User Authentication/Access Control
This is one of my favorite features. It allows you to password
protect a directory or a file without using CGI scripts. It also
allows you to deny or grant access based on the IP address or domain
name of the client. That is a great feature for keeping jerks out of
your message boards and guest books (you get the IP or domain name
from the log files).
To allow user authentication the directory must have AllowOverrides
AuthConfig set in the access.conf file. To allow access control (by
domain or IP address) AllowOverrides Limit must be set for that
directory.
Setting up the directory involves putting an .htaccess file in the
directory. For user authentication it is usually used with an
.htpasswd and optionally a .htgroup file. Those files can be shared
among multiple .htaccess files if you wish.
For security reasons I recommend that everyone use these directives in
there access.conf file:
<files ~ "/\.ht">
order deny,allow
deny from all
</files>
If you are not the administrator of the system you can also put it in
your This directive will prevent people from looking into your access
control files (.htaccess, .htpasswd, etc).
There are many different options and file types that can be used with
access control. Therefore it is beyond the scope of this document to
describe the files. For information on how to setup User
Authentication see the Apacheweek feature at
<http://www.apacheweek.com/features/userauth> or the NCSA pages at
<http://hoohoo.ncsa.uiuc.edu/docs-1.5/tutorials/user.html>.
7.10. su-exec
The su-exec feature runs CGI scripts as the user of the owner.
Normally it is run as the user of the web server (usually nobody).
This allows users to access there own files in CGI scripts without
making them world writable (a security hole). But if you are not
careful you can create a bigger security hole by using the su-exec
code. The su-exec code does security checks before executing the
scripts, but if you set it up wrong you will have a security hole.
The su-exec code is not for amateurs. Don't use it if you don't know
what you are doing. You could end up with a gaping security hole
where your users can gain root access to your system. Do not modify
the code for any reason. Be sure to read all the documentation
carefully. The su-exec code is hard to setup on purpose, to keep the
amateurs out (everything must be done manually, no make file no
install scripts).
The su-exec code resides in the support directory of the source.
First you need to edit the suexec.h file for your system. Then you
need to compile the su-exec code with this command:
gcc suexec.c -o suexec
Then copy the suexec executable to the proper directory. The Apache
default is /usr/local/etc/httpd/sbin/. This can be changed by editing
httpd.h in the Apache source and recompiling Apache. Apache will only
look in this directory, it will not search the path. Next the file
needs to be changed to user root (chown root suexec) and the suid bit
needs to be set (chmod 4711 suexec). Finally restart Apache, it
should display a message on the console that su-exec is being used.
CGI scripts should be set world executable like normal. They will
automaticaly be run as the owner of the CGI script. If you set the
SUID (set user id) bit on the CGI scripts they will not run. If the
directory or file is world or group writable the script will not run.
Scripts owned by system users will not be run (root, bin, etc.). For
other security conditions that must be met see the su-exec
documentation. If you are having problems see the su-exec log file
named cgi.log.
Su-exec does not work if you are running Apache from inetd, it only
works in daemon mode. It will be fixed in the next version because
there will be no inetd mode. If you like playing around in source
code, you can edit the http_main.c. You want to get rid of the line
where Apache announces that it is using the su-exec wrapper (It
wrongly prints this in front of the output of everything).
Be sure and read the Apache documentation on su-exec. It is included
with the source and is available on the Apache web site at
<http://www.apache.org/docs/suexec.html>
7.11. Imagemaps
Apache has the ability to handle server side imagemaps. Imagemaps are
images on webpages that take users to different locations depending on
where they click. To enable imagemaps first make sure the imagemap
module is installed (its one of the default modules). Next you need
to uncomment the .map handler at the end of the srm.conf file. Now
all files ending in image to separate links. Apache uses map files in
the standard NCSA format. Here is an example of using a map file in a
web page:
<a href="/map/mapfile.map">
<img src="picture.gif" ISMAP>
</a>
In this example mapfile.map is the mapfile, and picture.gif is the
image to click on.
There are many programs that can generate NCSA compatible map files or
you can create them yourself. For a more detailed discussion of
imagemaps and map files see the Apacheweek feature at
<http://www.apacheweek.com/features/imagemaps>.
7.12. SSI/XSSI
Server Side Includes (SSI) adds dynamic content to otherwise static
web pages. The includes are embedded in the web page as comments.
The web server then parses these includes and passes the results to
the web server. SSI can add headers and footers to documents, add
date the document was last updated, execute a system command or a CGI
script. With the new eXtended Server Side Includes (XSSI) you can do
a whole lot more. XSSI adds variables and flow control statements
(if, else, etc). Its almost like having an programming language to
work with.
Parsing all HTML files for SSI commands would waste allot of system
resources. Therefore you need to distinguish normal HTML files from
those that contain SSI commands. This is usually done by changing the
extension of the SSI enhanced HTML files. Usually the .shtml
extension is used.
To enable SSI/XSSI first make sure that the includes module is
installed. Then edit srm.conf and uncomment the AddType and
AddHandler directives for you want to run SSI/XSSI files. This is
done in the access.conf file. Now all files with the extension .shtml
will be parsed for SSI/XSSI commands.
Another way of enabling includes is to use the XBitHack directive. If
you turn this on it looks to see if the file is executable by user.
If it is and Options Includes is on for that directory, then it is
treated as an SSI file. This only works for files with the mime type
text/html (.html .htm files). This is not the preferred method.
There is a security risk in allowing SSI to execute system commands
and CGI scripts. Therefore it is possible to lock that feature out
with the Option IncludesNOEXEC instead of Option Includes in the
access.conf file. All the other SSI commands will still work.
For more information see the Apache mod_includes documentation that
comes with the source. It is also available on the website at
<http://www.apache.org/docs/mod/mod_include.html>.
For a more detailed discussion of SSI/XSSI implementation see the
Apacheweek feature at <http://www.apacheweek.com/features/ssi>.
For more information on SSI commands see the NCSA documentation at
<http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html>.
For more information on XSSI commands goto
<ftp://pageplus.com/pub/hsf/xssi/xssi-1.1.html>.
7.13. Module system
Apache can be extended to support almost anything with modules. There
are allot of modules already in existence. Only the general interest
modules are included with Apache. For links to existing modules goto
the
Apache Module Registry at <http://www.zyzzyva.com/module_registry/>.
For module programming information goto
<http://www.zyzzyva.com/module_registry/reference/>
8. Web Server Add-ons
Sorry this section has not been written yet.
Coming soon: mSQL, PHP/FI, cgiwrap, Fast-cgi, MS frontpage extentions,
and more.
9. FAQ
There aren't any frequent asked questions - yet...
10. For further reading
10.1. O'Reilly & Associates Books
In my humble opinion O'Reilly & Associates make the best technical
books on the planet. They focus mainly on Internet, Unix and
programming related topics. They start off slow with plenty of
examples and when you finish the book your an expert. I think you
could get by if you only read half of the book. They also add some
humor to otherwise boring subjects.
They have great books on HTML, PERL, CGI Programming, Java,
JavaScript, C/C++, Sendmail, Linux and much much more. And the fast
moving topics (like HTML) are updated and revised about every 6 months
or so. So visit the O'Reilly & Associates <http://www.ora.com/> web
site or stop by your local book store for more info.
And remember if it doesn't say O'Reilly & Associates on the cover,
someone else probably wrote it.
10.2. Internet Request For Comments (RFC)
╖ RFC1866 written by T. Berners-Lee and D. Connolly, "Hypertext
Markup Language - 2.0", 11/03/1995
╖ RFC1867 writtenm by E. Nebel and L. Masinter, "Form-based File
Upload in HTML", 11/07/1995
╖ RFC1942 written by D. Raggett, "HTML Tables", 05/15/1996
╖ RFC1945 by T. Berners-Lee, R. Fielding, H. Nielsen, "Hypertext
Transfer Protocol -- HTTP/1.0", 05/17/1996.
╖ RFC1630 by T. Berners-Lee, "Universal Resource Identifiers in WWW:
A Unifying Syntax for the Expression of Names and Addresses of
Objects on the Network as used in the World-Wide Web", 06/09/1994
╖ RFC1959 by T. Howes, M. Smith, "An LDAP URL Format", 06/19/1996