The Import module reads server log files one line at a time. Each line is then parsed (separated) into the data fields shown in the following table.
Information |
Description |
Internet address |
The Internet address (Internet IP or resolved Internet host name) from which the request came, and to which the server sent the response. |
Time stamp |
The date and time the server responded to the request. |
File name |
The content file name, or URL, that was sent back to the Internet address. |
User name |
The user name used to log in to a site requiring registration. |
Size |
The size of the response in bytes. This number is used to calculate bandwidth usage. |
User agent |
The product name, product version, operating system, and the security scheme of the web browser used for the request. |
Referrer |
The referring URL of the current request (the page containing the link the user clicked.) |
Cookie |
A persistent identification code assigned to the user, which allows you to track the user across several visits. |
HTTP response code |
The response code (200, 304, or 302) associated with a request. |
Site type |
The type of Internet site (web, gopher, FTP). This field is found within log files that are common to multiple servers. |
Server IP |
The IP address of the individual server. This field is necessary to distinguish servers for a multihomed log data source. |
Internet server log files must be in one of the formats shown in the following table for the Import module to interpret the log file data correctly.
Site type |
Supported log file formats |
World Wide Web |
Common log file format EMWACS log file format Usage Analyst extended log file format Intersé Market Focus 1 database Intersé Market Focus 2 database MCI log file format Microsoft IIS standard log file format Microsoft IIS extended log file format Microsoft IIS hyperextended log file format Microsoft IIS ODBC log file format NCSA combined log file format NCSA combined w/ servername log file format NCSA w/ servername log file format Netscape Proxy extended logging format Open Market extended log file format O’Reilly Multihome common log file format 1.0 O’Reilly Windows log file format 1.0 SiteTrack log file format Spry Web Server ASCII log file format Spry Web Server ODBC log file format Universal log file format UUNET extended log file format WebFacts audit log file format WebStar log file format Zeus common log file format |
FTP |
WU Archive FTP log file format Microsoft Internet Server log file format |
Gopher |
Microsoft Internet Server log file format |
Real time stream |
Real Audio log file format |
This log file format includes:
Here’s a sample line from the common log file:
www.interse.com - bob [08/Aug/1995:06:00:00 -0800] "GET /analyst/
HTTP/1.0" 200 1067
Note
Any text after the file size on a log file line is ignored by the Import module.
This log file format is produced by the EMWACS web server. EMWACS is a public domain web server for Windows NT.
Here’s a sample line from an EMWACS log file:
Mon Aug 07 08:54:39 1995 204.86.26.20 157.54.17.9 GET /analyst/
HTTP/1.0
The extended log file format is simply the common log format with three additional fields:
Note
This format was designed to be compatible with many of the custom extended log file formats currently in use.
Not all fields are required to comply with the format. Any of the following would be a valid extended log file entry:
The syntax of the extended log file format is:
Common Log Format ["referrer/-" ["user agent" ["cookie"]]]
Quotes around the referrer string are optional, but recommended. If quotes start the referrer string, quotes must end it. If quotes aren’t used and the referrer is blank, then a hyphen is accepted as a blank referrer string.
User agent strings must be surrounded by quotes. A blank user agent string is represented by "".
Cookie strings must be surrounded by quotes. A blank cookie string is represented by "".
Here’s a sample line from the extended log with all three extended fields:
www.interse.com - bob [08/Aug/1995:06:00:00 -0800] "GET /analyst/
HTTP/1.0" 200 1067 "http://www.infoseek.com?qt=Interse" "Mozilla
2.0b4 Windows 32-bit" "INTERSE=12345678910"
Note
Logging modules are available for Microsoft IIS, Apache, and Netscape to add user agent, referrer, and cookie data to your log files within the extended log file format. For more information, see http://www.interse.com/serverext/
This log file format includes usage information for web, gopher, and FTP servers that run under the Microsoft Internet Information Server.
Here’s a sample line from a Microsoft IIS standard log file:
www.interse.com, -, 8/7/95, 8:54:39, W3SVC, WWW,157.54.17.9, 490, 232, 4401, 200, 0,GET, /analyst/, -,
This log file format includes usage information along with user agent and referrer data for web, gopher, and FTP servers that run under the Microsoft Internet Information Server.
Here’s a sample line from a Microsoft IIS extended log file:
153.36.62.27, -, 3/1/96, 0:00:00, W3SVC, WWW, 198.105.232.4, 3565, 245, 2357, 200, 0, GET, /MSOffice/Images/button5a.gif, Mozilla/1.22 (compatible; MSIE 2.0; Windows 95), http://www.microsoft.com/msoffice/, -,
NCSA Version 1.5 includes built-in support for a combined log file format that includes user agent and referrer information. This URL describes the configuration options required to include this information in your log files. Note that the extended log file format of NCSA Version1.5 is encompassed by the definition of the extended log file format. For more information, see http://hoohoo.ncsa.uiuc.edu/docs/setup/httpd/TransferLog.html.
Here’s a sample line from an NCSA combined log file format:
tomato.interse.com - - [19/Sep/1995:15:19:07 -0500] "GET /images/icon.gif HTTP/1.0" 200 1656 "http://aboutus/" "NCSA_Mosaic/2.7b1 (X11;IRIX 5.3 IP22) libwww/2.12 modified"
NCSA Version 1.5 includes support for a log file format that includes the server name in the log file. If you are using VirtualHost support, this will be the name of VirtualHost. For more information, see http://hoohoo.ncsa.uiuc.edu/docs/setup/httpd/TransferLog.html.
This is a sample line from an NCSA log file format with server name:
tomato.interse.com - - [06/Oct/1995:13:51:23 -0500] "GET /beta-1.5/howto/fixes.html" 200 3296 www.interse.com
NCSA Version 1.5 includes support for a combined log file format that includes the server name, user agent, and referrer information. For more information, see http://hoohoo.ncsa.uiuc.edu/docs/setup/httpd/TransferLog.html.
tomato.interse.com - - [19/Sep/1995:15:19:07 -0500] "GET /images/icon.gif HTTP/1.0" 200 1656 www.interse.com "http://aboutus/" "Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)"
When imported as a proxy log, referrer information is ignored.
This is a sample of a Netscape proxy extended logging format:
127.0.0.1 - - [14/Aug/1996:05:00:01 -0700] "GET http://www.nytimes.com/ HTTP/1.0" 403 - "-" "Netscape-Proxy/2.0 (Batch update)" GET http://www.nytimes.com/ - "HTTP/1.0" - - - - 141 168 - - - - - -
This log file format is produced by the Open Market web server software.
This is a sample line from the file:
log {start 824480600.659060} {method GET} {url /animation/maxx/images/maxxhome.gif} {referrer http://mtv.com/animation/maxx/} {agent {Mozilla/2.0 (Win95; I)}} {bytes 65745} {status 200} {end 824480601.397589} {host 198.147.4.29}
This log file format is produced by the O’Reilly Web Site Version 1.0 web server software. For more information, see http://website.ora.com/techcenter/devcorner/white/winlog.htm.
This is a sample of that format:
05/18/96 00:41:33 user.interse.com olive.interse.com GET
/ourproducts/reports/contentsummary.html
http://olive.interse.com/ourproducts/reports/executivesummary.html
Mozilla/2.01 (Win95; I) 200 23511 9313
This is the format produced by Group Cortex's SiteTrack log file format. For more information, see http://www.cortex.net.
This is a sample of the SiteTrack format:
phx-az16-24.ix.netcom.com - - [07/08/1996:00:04:56]
GET /content/resources/cgi/netscape.html HTTP/1.0 302
264 Mozilla/2.02 (Win95; I)
http://www.stars.com/Vlib/Providers/CGI.html 0836798696514377
0836798696514377 0836798696514377 - 3 -
This log file format is produced by the Spry Web Server software.
Here’s a sample of the format:
GET,/mill/rock.gif, 200 OK,40740,02/05/1996 16:02:56 GMT,02/05/1996 16:02:58 GMT,149.174.73.127,149.174.73.127,ads-demo2.inhouse.compuserve.com,Mozilla/1.1 (Windows; U; 32bit),,http://ads-demo2:2000/mill/rocks.htm,
This log file format is produced for those sites being hosted by UUNET.
This is a sample of the format:
152.163.192.72 304 0 826693224 0.013 "" "GET / HTTP/1.0" "http://frostbite.umd.edu/%7Ecass/music.html" "IWENG/1.2.003 via proxy gateway CERN-HTTPD/2.0 libwww/2.17"
This is the format produced by the ABC WebFacts audit software.
Here is a sample of the format:
2 www.datamation.com 960526 85202 Cust57.Max5.Toronto.ON.MS.UU.NET
/PlugIn/Images/pluginN.gif http://www.datamation.com/
PlugIn/homepage/index_foot.htmlMozilla/2.0 (compatible; MSIE 2.0B; Windows 95;1024,768 304 0 0 0
This log file format is produced by Quarterdeck’s WebSTAR web server for the Macintosh.
Here’s a sample line from a WebSTAR server log file:
08/07/95 08:54:39 OK 157.54.17.9 :public:real.gif 2557
Note
Fields in this format are configurable. Usage Import supports the default fields shown above, plus an extended format that includes referrer, user agent, and username data. In this extended format, all extended fields are optional.
Here’s a sample line from an extended WebSTAR server log file:
Default fields referrer user agent username
This format is produced by the commercial Zeus web server. (See www.zeus.co.uk.)
This format closely resembles the common log file (CLF) format except the date format is d/mmm/yyyy rather than dd/mmm/yyy (i.e., the 3rd day of the month is represented as 3 rather than 03)
For example:
CLF: 03/Jul/1996
Zeus: 3/Jul/1996
The WU Archive FTP server, the most common UNIX FTP software, records incoming and outgoing file transfers in a log file, which, by default, is named XFERLOG.
Here’s a sample line from a WU Archive FTP log file:
Sat Dec 16 04:48:30 1995 1 www.interse.com 124 /README a _ o a
support@www.interse.com ftp 0 *
For more information on this format, please refer to the section on the Microsoft IIS standard log file format.
Here’s a sample line from a Microsoft Internet Information Server log file (FTP):
www.interse.com, -, 8/7/95, 8:54:39, FTPSVC, WWW,157.54.17.9, 490, 232, 4401, 200, 0,GET, /analyst/, -,
For more information on this format, please refer to the section on the Microsoft IIS standard log file format.
Here’s a sample line from a Microsoft Internet Information Server log file (Gopher):
www.interse.com, -, 8/7/95, 8:54:39, GopherSVC, WWW,157.54.17.9, 490, 232, 4401, 200, 0,GET, /analyst/, -,
This format is the same as the common log file format, but the user agent string is appended immediately following the file size.
This is a sample of the Real Audio log file format:
www.interse.com - bob [08/Aug/1995:06:00:00 -0800] "GET /analyst/
HTTP/1.0" 200 1067 "http://www.infoseek.com"