home *** CD-ROM | disk | FTP | other *** search
-
- <HTML>
- <HEAD>
- <TITLE>HTML::Filter - Filter HTML text through the parser</TITLE>
- <LINK REL="stylesheet" HREF="../../../Active.css" TYPE="text/css">
- <LINK REV="made" HREF="mailto:">
- </HEAD>
-
- <BODY>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> HTML::Filter - Filter HTML text through the parser</P></STRONG>
- </TD></TR>
- </TABLE>
-
- <A NAME="__index__"></A>
- <!-- INDEX BEGIN -->
-
- <UL>
-
- <LI><A HREF="#name">NAME</A></LI><LI><A HREF="#supportedplatforms">SUPPORTED PLATFORMS</A></LI>
-
- <LI><A HREF="#synopsis">SYNOPSIS</A></LI>
- <LI><A HREF="#description">DESCRIPTION</A></LI>
- <LI><A HREF="#examples">EXAMPLES</A></LI>
- <LI><A HREF="#bugs">BUGS</A></LI>
- <LI><A HREF="#see also">SEE ALSO</A></LI>
- <LI><A HREF="#copyright">COPYRIGHT</A></LI>
- </UL>
- <!-- INDEX END -->
-
- <HR>
- <P>
- <H1><A NAME="name">NAME</A></H1>
- <P>HTML::Filter - Filter HTML text through the parser</P>
- <P>
- <HR>
- <H1><A NAME="supportedplatforms">SUPPORTED PLATFORMS</A></H1>
- <UL>
- <LI>Linux</LI>
- <LI>Solaris</LI>
- <LI>Windows</LI>
- </UL>
- <HR>
- <H1><A NAME="synopsis">SYNOPSIS</A></H1>
- <PRE>
- require HTML::Filter;
- $p = HTML::Filter->new->parse_file("index.html");</PRE>
- <P>
- <HR>
- <H1><A NAME="description">DESCRIPTION</A></H1>
- <P>The <EM>HTML::Filter</EM> is an HTML parser that by default prints the
- original text parsed (a slow version of <CODE>cat(1)</CODE> basically). You can
- override the callback methods to modify the filtering for some of the
- HTML elements and you can override <CODE>output()</CODE> method which is called to
- print the HTML text.</P>
- <P>The <EM>HTML::Filter</EM> is a subclass of <EM>HTML::Parser</EM>. This means that
- the document should be given to the parser by calling the $p-><CODE>parse()</CODE>
- or $p-><CODE>parse_file()</CODE> methods.</P>
- <P>
- <HR>
- <H1><A NAME="examples">EXAMPLES</A></H1>
- <P>The first example is a filter that will remove all comments from an
- HTML file. This is achieved by simply overriding the comment method
- to do nothing.</P>
- <PRE>
- package CommentStripper;
- require HTML::Filter;
- @ISA=qw(HTML::Filter);
- sub comment { } # ignore comments</PRE>
- <P>The second example shows a filter that will remove any <TABLE>s
- found in the HTML file. We specialize the <CODE>start()</CODE> and <CODE>end()</CODE> methods
- to count table tags and then make output not happen when inside a
- table.</P>
- <PRE>
- package TableStripper;
- require HTML::Filter;
- @ISA=qw(HTML::Filter);
- sub start
- {
- my $self = shift;
- $self->{table_seen}++ if $_[0] eq "table";
- $self->SUPER::start(@_);
- }</PRE>
- <PRE>
- sub end
- {
- my $self = shift;
- $self->SUPER::end(@_);
- $self->{table_seen}-- if $_[0] eq "table";
- }</PRE>
- <PRE>
- sub output
- {
- my $self = shift;
- unless ($self->{table_seen}) {
- $self->SUPER::output(@_);
- }
- }</PRE>
- <P>If you want to collect the parsed text internally you might want to do
- something like this:</P>
- <PRE>
- package FilterIntoString;
- require HTML::Filter;
- @ISA=qw(HTML::Filter);
- sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
- sub filtered_html { join("", @{$_[0]->{fhtml}}) }</PRE>
- <P>
- <HR>
- <H1><A NAME="bugs">BUGS</A></H1>
- <P>Comments in declarations are removed from the declarations and then
- inserted as separate comments after the declaration. If you turn on
- strict_comment(), then comments with embedded ``--'' are split into
- multiple comments.</P>
- <P>
- <HR>
- <H1><A NAME="see also">SEE ALSO</A></H1>
- <P><A HREF="../../../site/lib/HTML/Parser.html">the HTML::Parser manpage</A></P>
- <P>
- <HR>
- <H1><A NAME="copyright">COPYRIGHT</A></H1>
- <P>Copyright 1997-1998 Gisle Aas.</P>
- <P>This library is free software; you can redistribute it and/or
- modify it under the same terms as Perl itself.</P>
- <TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
- <TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
- <STRONG><P CLASS=block> HTML::Filter - Filter HTML text through the parser</P></STRONG>
- </TD></TR>
- </TABLE>
-
- </BODY>
-
- </HTML>
-