home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World 2001 April
/
PCWorld_2001-04_cd.bin
/
Software
/
TemaCD
/
webclean
/
!!!python!!!
/
PyXML-0.6.3.win32-py2.0.exe
/
xmldoc
/
README.sgmlop
< prev
next >
Wrap
Text File
|
2000-09-26
|
3KB
|
99 lines
=============================
The sgmlop accelerator module
=============================
sgmlop contains an optimized SGML/XML parser, designed as an add-on to
the sgmllib/htmllib and xmllib modules shipped with Python 1.5.
using empty callbacks, this driver is about 6 times faster than the
original xmllib implementation. when using sgmlop directly, it can be
more than 50 times faster. for more information on benchmarking
sgmlop, see below.
Enjoy /F
fredrik@pythonware.com
http://www.pythonware.com
--------------------------------------------------------------------
Copyright (c) 1998 by Secret Labs AB.
Permission to use, copy, modify, and distribute this software and
its associated documentation for any purpose and without fee is
hereby granted. This software is provided as is.
--------------------------------------------------------------------
release info
------------
This is the third public release. Changes include:
- added a starttag attribute parser written in C. this gives
a considerable speedup on files using lots of tag attributes
- the callback object can now have an sgmllib/xmllib interface
(finish/handle) *or* a saxlib interface (see saxhack.py for
an example).
contents
--------
README this file
sgmllib.py a drop-in replacement for the sgmllib.py module
distributed with Python 1.5
xmllib.py a drop-in replacement for the xmllib.py module
distributed with Python 1.5
saxhack.py illustrates how to implement the SAX DocumentHandler
interface directly with native sgmlop. this is over
30 times faster than a corresponding parser based on
the original xmllib.
sgmlop.dll a precompiled version for python 1.5 on win32
sgmlop.c accelerator source code
sgmlop.mak makefile for MSVC++ 5.0 generated by opal/pymake.
make sure to change the directory names before you
use it on your own machine.
bench*.py various test files and benchmarks
test*.py
benchmarks
----------
benchmarking the sgmlop parser is non-trivial; if you don't install
any callbacks, it's some 300 times faster than the original xmllib (it
can parse more than 10 MB/s on a fast Pentium II). this means that in
a typical test, far more time is lost on the Python method call
overhead than on the parsing proper.
my earlier benchmarks used a 'collecting' parser, which stored all
tags and elements in a list. with that setup, sgmlop is roughly 5
times faster than the original implementation.
the benchxml.py script provided with this release uses empty parsers
instead (that is, all callbacks exists, but they include only a 'pass'
operation), in order to measure the parser and Python call overhead
only.
here's a typical test run (with the time for the original xmllib
implementation set to 1):
parser time
--------------------------------------------------------------------
slow xmllib 1.0
fast xmllib 0.156 (6.4x)
sgmlop dummy 0.019 (53.5x)
sgmlop null 0.003 (297.8x)
the null time is obtained by running the parser without any callbacks
installed.