home *** CD-ROM | disk | FTP | other *** search
- htmlchek version 4.0, January 17 1995
-
-
- htmlchek -- Syntactically checks HTML 2.0 or 3.0 files for a
- number of possible errors; can do local link
- cross-reference checking, and generate a
- rudimentary reference-dependency map. Runs
- under awk or perl. Includes a number of
- supplemental utilities for HTML file processing.
-
-
- This release of htmlchek (version 4.0) is a moderately significant
- upgrade to previous versions, and includes the following files:
- (The documentation for all programs and shell scripts other than htmlsrpl.pl
- is in htmlchek.man/htmlchek.html.)
-
- README.40 This file
- htmlchek.man Documentation
- htmlchek.html HTML version of Documentation
-
- htmlchek.awk Awk version of htmlchek HTML error checker
- htmlchek.pl Port of htmlchek to perl
- example.cfg Sample htmlchek configuration file
- html2dtd.cfg Config. file for stricter compliance with 2.0 DTD
-
- htmlqref.txt Yet another HTML quick reference (plain text)
- htmlqref.html HTML version of yet another HTML quick reference
-
- htmlsrpl.pl HTML-aware search-and-replace program (perl)
- htmlsrpl.man Documentation for htmlsrpl.pl
- htmlsrpl.html HTML version of documentation for htmlsrpl.pl
-
- xtraclnk.pl Extracts links and link/title text from HTML files (perl)
-
- makemenu.awk Makes simple menu for HTML files using <TITLE>; can also
- makemenu.pl make table of contents using <H1>-<H6> (awk/perl)
-
- dehtml.awk Remove all HTML markup, preliminary to spell check (awk)
- dehtml.pl Perl version of dehtml
-
- entify.awk Replace high Latin 1 alphabetic characters with ampersand
- entify.pl entities for safe 7-bit transport (awk/perl)
-
- metachar.awk Trivial program to protect HTML/SGML "&<>" metacharacters
- metachar.pl in text to be included in an HTML file (awk/perl)
-
- (Unix shell files:)
-
- htmlchek.sh Run htmlchek.awk under the best available interpreter,
- and with options checking
- htmlchkp.sh Run htmlchek.pl with external options checking
- runachek.sh Do cross-reference checking using htmlchek.awk
- runpchek.sh Do cross-reference checking using htmlchek.pl
- rducfila.sh Reduce .NAME/.HREF files (external xref check, awk)
- rducfilp.sh Reduce .NAME/.HREF files (external xref check, perl)
- makemenu.sh Run makemenu.awk under the best available interpreter,
- and with options checking
- dehtml.sh Run dehtml.awk under the best available interpreter
-
-
- The htmlchek program checks for quite a number of possible defects
- in the HTML (Hyper-Text Mark-up Language) version 2.0 SGML files used
- on the World-Wide Web. (Preliminary HTML 3.0 files for the Arena
- browser, or files with Netscape extensions, can also be checked by
- specifying the appropriate options.) The program makes no claim to
- understand all of SGML, but is easy and relatively simple to use,
- gives lots of information (including about many stylistically bad
- practices), can do local cross-reference checking and generate
- rudimentary reference-dependency maps, and can be run on any platform
- for which the language interpreter (awk or perl) is available.
-
- This release of htmlchek also includes a number of supplemental
- utilities, including the htmlsrpl.pl HTML-aware search-and-replace
- program, which uses either literal strings or regular expressions;
- acts either only outside HTML/SGML tags, or only within tags; can be
- restricted to operate only within and/or only outside specified
- elements; and can also upper-case tag names.
-
- The accompanying .sh files are for greater ease of use under Unix
- (actually, any Posix 1003.2, including VMS Posix) but nothing in
- htmlchek.awk or htmlchek.pl themselves, or in the accompanying
- supplemental programs, depends on the Unix operating system (in
- particular, the perl programs do not use any of the Unix-specific
- systems-programming features of the perl language), so that this
- package can be used on non-Unix systems.
-
- If you seem to get a million errors the first time you run htmlchek
- on a file, don't be dismayed -- sometimes htmlchek can't compensate
- for an error, so that the invalid HTML code it has encountered affects
- its interpretation of valid HTML code later on in the file. Just go
- back and fix the _first_ error, or first few errors, in the HTML file,
- then run htmlchek again and see what you get. Iterate as necessary.
- (However, I have tried to eliminate many of the cascades of redundant
- errormessages that some earlier versions of this program tended to
- generate.)
-
- The htmlchek program performs a fairly comprehensive job of
- checking for HTML errors, but does not always exactly follow the
- official standard (currently this is version 1.22 of the HTML 2.0
- DTD). Bad stylistic practices are warned against, as well as actual
- HTML errors, and in some cases htmlchek is stricter than the standard,
- in order to accommodate the peculiarities of some browsers. The idea
- is that HTML code should be ruggedized for the real world, rather than
- just being SGML-ically correct -- especially since the official
- standard allows many SGML features which are hardly understood by any
- HTML-specific applications; for example, according to the official
- standard the following is a completely valid HTML 2.0 file (without
- even any omitted tags!):
-
- <><HEAD/<TITLE///<BODY/text<IMG TOP SRC=x.gif<![IGNORE[ </HTML>]]>/</>
-
- Version 4.0 of the htmlchek distribution has the following new features:
-
- Main changes to htmlchek: added internal cross-reference checking (not as
- hard as I thought it would be!); added option of generating dependency
- map; added command-line options to allow `<' and`>' characters within
- quoted attribute values and <!-- --> comments, and `>' characters outside
- tags. Other changes: added HTML quick reference, in plain text and .html
- versions; added htmlsrpl.pl; added xtraclnk.pl; added makemenu.awk/
- makemenu.pl; added metachar.awk/metachar.pl; added Perl version of
- entify; enhanced the Unix/Posix-1003.2 shell scripts to redirect
- non-program output to STDERR, detect non-zero exit status of awk/perl,
- and add required trailing slashes automatically. Minor changes to
- htmlchek: added sample configuration files; added check for content of
- <ADDRESS> element; now detect multiple <HEAD> elements in document;
- <OPTION>, <TEXTAREA>, and <TITLE> elements should not contain any tags;
- <INPUT>, <SELECT> and <TEXTAREA> do not have to be _immediately_
- contained within a <FORM> (inclusion exception); allow reqopts=
- command-line option to specify multiple required attributes for a single
- tag; added dlstrict= option and changed default strictness to that of
- dlstrict=1; differentiated novalopts= from tagopts=; added subtract="..."
- command-line option (to facilitate checking files outside current
- directory); updated Arena/HTML3 language definition; tinkered with the
- Netscape language definition (in the absence of any definitive
- documentation); improved internal htmlchek.pl options checking; other
- minor fixes and enhancements.
-
- Both the awk program htmlchek.awk and a port of this awk program to
- perl are included in the distribution (the original reason for doing
- the perl port in the first place was to make it possible to add full
- off-site cross-reference checking over the the Web; however, this
- project may never be completed, and at present the awk and perl
- programs have the same functionality); similarly, most of the
- supplemental programs also have both awk and perl versions. You might
- use one or the other based on personal preference, or because some
- vendor-supplied awks on Unix boxes have proven to exhibit unendearing
- peculiarities (you can also get around this by using GNU gawk if it is
- on your system, or getting it from one of the ftp sites listed at the
- end of almost every posting to the Usenet group gnu.announce and
- compiling it; the program htmlchek.sh will automatically run gawk in
- preference to nawk or awk, if gawk is on your system and in your
- PATH). Gawk for MS-DOS (and a pointer to OS/2 gawk) is available from
- ftp://oak.oakland.edu/SimTel/msdos/awk/. (See awk-perl.html.)
-
-
- Typical command lines:
-
- awk -f htmlchek.awk [options] infiles.html > outfile.check
-
- perl htmlchek.pl [options] infiles.html > outfile.check
-
- The options are in the form "option=value" (see htmlchek.html or
- htmlchek.man). Remember that on some Unix systems ``awk'' is an
- archaic incompatible program, so you should use ``nawk'' or ``gawk''
- instead; the shell script htmlchek.sh will do this automatically (and
- do some options checking as well):
-
- sh htmlchek.sh [options] infiles.html > outfile.check
-
-
- Author: Henry Churchyard churchh@uts.cc.utexas.edu
-