WebSwoon
Author : Igor Kouzmine @ LaCaveProds - Copyright © 2004-2005
Homepage :
http://www.intellitamper.com/webswoon/
Email :
tamper@laposte.net
This program is free software; you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the
Free Software Foundation; either version
2 of the License, or (at your option) any later version.
In this text you will find many useful information about the program, take your time to read it.
Documentation content
Disclaimer
What is WebSwoon ?
Installation
Known limitations
Web sites list
Program's usage
Configuration window
Version history and updates
Program translation information
• • • • • • •
• Disclaimer
WebSwoon comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to
redistribute it under certain conditions; please read included license in text file "webswoon_license.txt".
Please
send an email to the author for any suggestion or bug reports.
• • • • • • •
• What is WebSwoon ?
WebSwoon automatically opens several web pages, captures their content and saves them as pictures.
It is useful, for example, if you want to provide information about a web site and display a capture
of it as a thumbnail.
You just have to fill a list of all the web sites (alias URLs) you want to grab and starts WebSwoon. You
can choose to open the browser window used for captures, if you want to ensure that web site is
correctly displayed, or if you want to do some actions on the page before that the capture is done.
Features :
- Automatically load web sites and saves captures as JPEG/PNG/GIF pictures.
- Whole page content is captured (Java applets, Flash animations, etc.).
- Adjustable width and height of captures.
- Adjustable browser size for captures.
- Optionnal delay to wait for some animation on web pages.
- Browser window can be displayed to browse web sites before the capture.
- Final file name of capture is fully configurable.
- Pretty and easy to use program interface.
- Program is free and released under GPL !
- And many more...
WebSwoon is written in
Python and uses
wxWidgets. Sorry about the package size,
but several runtime libraries must be installed with program (they will be installed only in program's installation folder).
The current version of WebSwoon relies on
Internet Explorer and thus requires Windows and IE
to be installed. It has been tested with IE6 but should run with older versions. If you want to
use a proxy, desactivate javascript, or modify any option related to the browser, do it from
the usual IE configuration panel. WebSwoon will use them automatically. Everything, except
popup windows which are blocked, will run in WebSwoon browser as if you were using IE alone.
• • • • • • •
• Installation
Install the program package in the folder of your choice, then run "WebSwoon" from from the
Windows Start menu. A console version of program is also available in the installation folder,
use webswoon_console.exe --help for more information.
Advanced users : All configuration information is in the "webswoon.cfg" file and can be tweaked if you respect the
file format and letter case (A is not a).
WebSwoon does not read/write anything in Windows registry. Only some Windows system DLLs
used indirectly by program/Python may do this, but it shoudn't happen. WebSwoon will survive
to a Windows reinstallation if you install it in a safe folder.
• • • • • • •
• Known limitations
Some WebSwoon limits are known, they are not "bugs" :
- Some web sites appear to be longer to capture than others, even if the page seems to be
completed. Using the ActiveX IE control makes sometimes hard to know when the page is
really fully loaded and displayed. In some rare cases, program has to wait for a delay to
expires when no data are received to ensure to get the full page. You can configure this
delay as the "Timeout delay" in the configuration, but a too small value may results in
captures with missing graphics on some web pages.
- Popup windows from browser window are blocked. However it seems that sometimes some
popups are not blocked and can appear when the program switches to another web site.
- Sometimes the browser may show an alert dialog requesting an user choice (from browser
itself or from Javascript alerts) These dialogs will block the captures until they
are closed manually. Options are available in configuration window to disable these alerts,
but it may results in incomplete captures.
• • • • • • •
• Web sites list
The web sites list containing all URLs used to build captures is stored in the file "websites_list.txt".
You can edit it by hand, generate it from a database using an appropriate program, or fill it
directly from WebSwoon.
Format is simple : one url by line ending with \r\n.
• • • • • • •
• Program's usage
As the web sites captures are done via InternetExplorer ActiveX control, the browser integrated in
program is compatible with everything that IE can handle on your system (Javascript, Java, Flash...).
It will also use automatically the cookies stored on your IE to access pages requiring them.
You can choose to show or hide the browser window. If you want to make some action on a web
site before the capture, you can choose to open the browser window and click on a link to skip
the front page for example, or close an ads. The capture will be delayed until you don't do
anything.
With the browser window opened, you can also ensure that all captures are done correctly.
It can also help you in some cases to find and remove broken web sites in your list.
• • • • • • •
• Configuration window
Program panel :
- Delete all existing captures before to start : allow you to delete all captured images before
to start the captures. Warning ! Enabling this option will delete all files in captures folder.
- Update existing captures older than x days x hours x mins : allow you to specify when program
must refresh an existing capture. If you want fresh updates, you can set 1 minute and run
program in loop-mode.
- Automatically restart captures when finished after xxx minutes : allow you to run program in
loop-mode. When all captures are finished, program will wait for this delay and then restart
all captures from the begining. You can abort captures using the "stop capture" option in menu
or the "abort captures" option in browser window if it is displayed.
- Interface language : allow you to select language used for program interface. Modifying this
option will require program to restart.
Browser panel :
- Open browser window during captures : display the browser window and allow some actions by
user during captures.
- Start in auto-capture mode : allow you to disable automatic capture by default in browser.
- Browser width/height : specify size of browser window. Remember that a 800x600 screen is still
the standard for building web sites.
- Canva width/height : specify size of browser canva in full page mode only. Web page will be loaded
in this large blank area and limits of content will be detected automatically.
[this option has been suspended until further notice]
- Ignored URLS : specify which addresses must not be loaded and displayed in the browser window,
it's used typically to hide adverts. Multiple URLs must be separated with the ";" character.
- Wait delay after complete page : how many time program must wait before to capture and save
the image after that the web site is fully loaded. If you want to wait for a Flash animation to start,
you can set 2 or 3 seconds.
- Timeout delay : how many time program must wait for a web site to load. After this delay the
capture will be done even if datas on the page are not fully loaded.
- Display browser error window (Javascript warnings) : Allow you to enable warning displayed
by browser when there are errors or questions.
- Disable Javascript in browser (avoid blocking Javascript alerts) : Allow to disable Javascript
in browser to ignore alerts with yes/no choice for example which are blocking captures.
Captures panel :
- Capture method : Standard view / Full page : allow you to choose the capture method.
Standard view will save only what is displayed in browser. Full page will the full height
of the page which needs to be scrolled to be viewed. The content limits are discovered
automatically. The Full page mode is slower as it requires two passes for each web page
to find limits of content correctly.
- Resize capture image : allow you to resize capture image to a specified size, often used
to generate thumbnails. In full page mode only the width will used, height will be calculated
automatically.
- Capture image width/height : specify the size in pixels of the resized capture image.
- Remove window border and scrollbar in captures : allow you to crop the visible content of the
browser window to hide the window border (some pixels around the content) and the vertical scrollbar.
Program is however unable to detect if scrollbar is really visible or not and will crop the image
anyway.
- Keep margins around content of xxx pixels : allow to keep a blank space
around the page content in full page capture mode only.
- Capture image format BMP/JPEG/GIF/PNG : specify image format of saved captures.
- Capture file name format : allow you to personnalize the file name of captures, you can use
these options that will be replaced with data in the final files names :
%p : protocol of URL (cleaned of incompatible symbols**)
%u : URL (cleaned of incompatible symbols**)
%e : extention of file (jpg, gif or png)
%y : year date in format YYYY
%m : month date in format MM
%d : day date in format DD
%i : number of the current web site (first web site is 1, second is 2,...)
%z : md5 hash of url (32 bytes in hexadecimal)
** All parasite characters \/:?~&=<> and %2C are replaced with the underscore
character "_". To avoid some problems on some web servers, .pl and .php are replaced
with _pl and _php in the url.
For example, with the file name format "%p%u_%y-%m-%d.%e" (without quotes), if URL is
"http://www.yahoo.com", the capture file name will be "http___www.yahoo.com_2004-06-15.jpg"
Note : the "://" characters after "http" have been replaced with 3 underscores "_" to be compatible
with file system.
- Save captures in folder : specify where to save captured pictures.
You can use the Default button to restore default settings in all options. Using the Cancel button
will restore your previous configuration.
• • • • • • •
• Version history and updates
WebSwoon 1.0 Beta 7 (July-2005)
- Fixed a bug preventing to save urls list in editor.
- Added %z parameters for file names, it will be replaced with md5 hash of url.
- Added a menu option to capture only one web page and to save its capture in a
specific folder.
- Added an option to capture the whole page and not only what is visible in browser.
Content limits will be discovered automatically. If resize is turned on, the height will
be ignored and calculated from specified width to keep proportions.
The Full page mode is slower as it requires two passes for each web page to find limits
of content correctly.
- Added an option to set canva size for full page mode. Web page will be loaded in
this large blank area and limits of content will be detected automatically.
- Added an option to specify margins size around content in full page mode.
- Added a switch option for image resizing. Turning it off will save image in full
original size.
- Added BMP format for image output.
- Updated documentation with latest changes in configuration window.
- German language is now available (thanks to Olaf Noehring).
WebSwoon 1.0 Beta 6 (May-2005)
- Program interface can now be translated in any language using gettext method.
- French and English languages are now available.
- New method to determine if page is fully loaded, improving speed of captures
and fixing some problems with pages being captured before to be fully loaded.
- Configuration window has been modified to reorder available options.
- Added description instead of numbers for some errors returned by browser.
- Added an option to set JPEG compression level.
- Added an option to set browser auto-capture mode checkbox on/off by default.
- Added an option to display browser errors (javascript warnings). Hiding them
may prevent some web sites requiring an user action to be loaded, like https
wrong certificate confirmation for example.
- Added an option to disable Javascript (ActiveScripting) in browser to avoid
some blocking Javascript alerts during captures (yes/no choices, etc...). Using
this option may prevent some web sites to be displayed correctly.
- Now a blank page is loaded between each site to not duplicate the previous
capture when a web site is broken.
- Fixed bug with file names containing .pl/.php and generating errors in Apache web server.
- When output format is PNG, pictures are now in 256 colors instead of 24bits, files sizes are smaller.
- Improved GIF pictures quality.
- Documentation has been converted to html for a better readability.
WebSwoon 1.0 Beta 5 (August-2004)
- Fixed a bug blocking program when saving large JPEG pictures.
WebSwoon 1.0 Beta 4 (August-2004)
- Added hours and minutes in refresh delay to allow smaller delay.
- Added an option to activate loop-mode. When all captures are finished, program will restart forever.
It must be used with a correct refresh captures delay in options.
- Time-out delay is now activated when a web site starts to load. Capture will be done even if the
page failed to load completely before this delay. It should prevent some web sites to block the
captures.
- Added a console version of program. Saved configuration will be used but you can override saved
parameters by specifying some options on the command line. Use the --help parameter for help.
Two versions of the executable are required because the command line is ignored when a Python
program is started with the console window hidden, like webswoon.exe for example.
- Switched the Auto-capture mode two-states button to a checkbox for more visibility under some
Windows themes.
- Updated documentation with a description for all configuration options.
- Added HTTP error return code in error message (like 404, etc.)
WebSwoon 1.0 Beta 3 (June-2004)
- Fixed height of browser window, some pixels were missing at the bottom.
- Added "Auto-capture mode" button in browser window to switch between auto/manual
capture mode. In manual mode, all configured delays are ignored but the button can be
pushed again to reactivate them.
- Added "Capure now" button to immediately capture the current web site and jump to
next one.
- Added "Skip" button to ignore current web site and jump to next one.
- Added "Abort captures" button to cancel all captures, equivalent to close the browser
window.
- Captures can now be resumed without restarting program when they were aborted.
- URL of current web site is now displayed in browser window.
- Added status bar in browser window displaying IE messages and a progress bar.
- Added %i parameter in file name format, it will be replaced with the number of the current
web site (first web site is 1, second is 2,...)
WebSwoon 1.0 Beta 2 (June-2004)
- Added a manager with list of URLs to browse and modify web sites addresses directly
from program.
- Added a menu option to import web sites list from a text file.
- Added ignored URLs field to prevent browser to load some ads and to prevent SideBar
feature on some web sites to replace main page content.
- Added an option to delete existing captures on startup.
- Added an option to remove border and scrollbar from captures.
- Added an expire time to refresh old existing captures.
- Added file name format field to personnalize file name of captures.
- Added PNG and GIF image format for captures.
- Folder in which to store captures can now be selected.
- Added a configurable timeout delay when it's impossible to determine if a page is
complete or not.
- Reviewed placement of options in configuration window.
- Greatly improved JPEG quality in captures.
- Fixed captures width, some pixels were missing on the right side.
- Fixed a memory bug blocking program after several hundreds of captures.
- Improved wait time when browsing a web site before to do the capture.
- Added counter in statusbar to have some information about progression.
- Fixed timers, delay selected in configuration was extended of one second.
WebSwoon 1.0 Beta 1 (April-2004)
- First public beta release, let's try it !
• • • • • • •
• Program translation information
If you are interested in translating WebSwoon in another language, you can use the
nice
POEdit program to build your own catalog. You can find the reference english catalog file in this folder :
[webswoon installation folder]\locale\en\LC_MESSAGES\webswoon\webswoon.po
You can test your own built webswoon.mo file by replacing the existing webswoon.mo file in a language
folder and by selecting it in the program options. When it's working, you can send your catalog by
email so it can be added in next release. Remember that original english texts must remain
against your translation in the catalog or it will not work.
To update an existing catalog, open it and update it using the english reference webswoon.pot file.
End of documentation.