home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Chip 2000 May
/
Chip_2000-05_cd1.bin
/
zkuste
/
Perl
/
ActivePerl-5.6.0.613.msi
/
䆊䌷䈹䈙䏵-䞅䞆䞀㡆䞃䄦䠥
/
_76715c26f2cff1be957455d17ef11753
< prev
next >
Wrap
Text File
|
2000-03-23
|
4KB
|
134 lines
<HTML>
<HEAD>
<TITLE>HTML::Filter - Filter HTML text through the parser</TITLE>
<LINK REL="stylesheet" HREF="../../../Active.css" TYPE="text/css">
<LINK REV="made" HREF="mailto:">
</HEAD>
<BODY>
<TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
<TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
<STRONG><P CLASS=block> HTML::Filter - Filter HTML text through the parser</P></STRONG>
</TD></TR>
</TABLE>
<A NAME="__index__"></A>
<!-- INDEX BEGIN -->
<UL>
<LI><A HREF="#name">NAME</A></LI><LI><A HREF="#supportedplatforms">SUPPORTED PLATFORMS</A></LI>
<LI><A HREF="#synopsis">SYNOPSIS</A></LI>
<LI><A HREF="#description">DESCRIPTION</A></LI>
<LI><A HREF="#examples">EXAMPLES</A></LI>
<LI><A HREF="#bugs">BUGS</A></LI>
<LI><A HREF="#see also">SEE ALSO</A></LI>
<LI><A HREF="#copyright">COPYRIGHT</A></LI>
</UL>
<!-- INDEX END -->
<HR>
<P>
<H1><A NAME="name">NAME</A></H1>
<P>HTML::Filter - Filter HTML text through the parser</P>
<P>
<HR>
<H1><A NAME="supportedplatforms">SUPPORTED PLATFORMS</A></H1>
<UL>
<LI>Linux</LI>
<LI>Solaris</LI>
<LI>Windows</LI>
</UL>
<HR>
<H1><A NAME="synopsis">SYNOPSIS</A></H1>
<PRE>
require HTML::Filter;
$p = HTML::Filter->new->parse_file("index.html");</PRE>
<P>
<HR>
<H1><A NAME="description">DESCRIPTION</A></H1>
<P>The <EM>HTML::Filter</EM> is an HTML parser that by default prints the
original text parsed (a slow version of <CODE>cat(1)</CODE> basically). You can
override the callback methods to modify the filtering for some of the
HTML elements and you can override <CODE>output()</CODE> method which is called to
print the HTML text.</P>
<P>The <EM>HTML::Filter</EM> is a subclass of <EM>HTML::Parser</EM>. This means that
the document should be given to the parser by calling the $p-><CODE>parse()</CODE>
or $p-><CODE>parse_file()</CODE> methods.</P>
<P>
<HR>
<H1><A NAME="examples">EXAMPLES</A></H1>
<P>The first example is a filter that will remove all comments from an
HTML file. This is achieved by simply overriding the comment method
to do nothing.</P>
<PRE>
package CommentStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub comment { } # ignore comments</PRE>
<P>The second example shows a filter that will remove any <TABLE>s
found in the HTML file. We specialize the <CODE>start()</CODE> and <CODE>end()</CODE> methods
to count table tags and then make output not happen when inside a
table.</P>
<PRE>
package TableStripper;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub start
{
my $self = shift;
$self->{table_seen}++ if $_[0] eq "table";
$self->SUPER::start(@_);
}</PRE>
<PRE>
sub end
{
my $self = shift;
$self->SUPER::end(@_);
$self->{table_seen}-- if $_[0] eq "table";
}</PRE>
<PRE>
sub output
{
my $self = shift;
unless ($self->{table_seen}) {
$self->SUPER::output(@_);
}
}</PRE>
<P>If you want to collect the parsed text internally you might want to do
something like this:</P>
<PRE>
package FilterIntoString;
require HTML::Filter;
@ISA=qw(HTML::Filter);
sub output { push(@{$_[0]->{fhtml}}, $_[1]) }
sub filtered_html { join("", @{$_[0]->{fhtml}}) }</PRE>
<P>
<HR>
<H1><A NAME="bugs">BUGS</A></H1>
<P>Comments in declarations are removed from the declarations and then
inserted as separate comments after the declaration. If you turn on
strict_comment(), then comments with embedded ``--'' are split into
multiple comments.</P>
<P>
<HR>
<H1><A NAME="see also">SEE ALSO</A></H1>
<P><A HREF="../../../site/lib/HTML/Parser.html">the HTML::Parser manpage</A></P>
<P>
<HR>
<H1><A NAME="copyright">COPYRIGHT</A></H1>
<P>Copyright 1997-1998 Gisle Aas.</P>
<P>This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.</P>
<TABLE BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
<TR><TD CLASS=block VALIGN=MIDDLE WIDTH=100% BGCOLOR="#cccccc">
<STRONG><P CLASS=block> HTML::Filter - Filter HTML text through the parser</P></STRONG>
</TD></TR>
</TABLE>
</BODY>
</HTML>