WebSwoon Documentation - Version 1.0 Beta 7 (July 2005)

WebSwoon - The web photographer


WebSwoon
Author : Igor Kouzmine @ LaCaveProds - Copyright © 2004-2005
Homepage : http://www.intellitamper.com/webswoon/
Email : tamper@laposte.net


This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

In this text you will find many useful information about the program, take your time to read it.


Documentation content

Disclaimer
What is WebSwoon ?
Installation
Known limitations
Web sites list
Program's usage
Configuration window
Version history and updates
Program translation information


• • • • • • •

• Disclaimer

WebSwoon comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions; please read included license in text file "webswoon_license.txt".

Please send an email to the author for any suggestion or bug reports.


• • • • • • •

• What is WebSwoon ?

WebSwoon automatically opens several web pages, captures their content and saves them as pictures. It is useful, for example, if you want to provide information about a web site and display a capture of it as a thumbnail.

You just have to fill a list of all the web sites (alias URLs) you want to grab and starts WebSwoon. You can choose to open the browser window used for captures, if you want to ensure that web site is correctly displayed, or if you want to do some actions on the page before that the capture is done.

Features :

- Automatically load web sites and saves captures as JPEG/PNG/GIF pictures.
- Whole page content is captured (Java applets, Flash animations, etc.).
- Adjustable width and height of captures.
- Adjustable browser size for captures.
- Optionnal delay to wait for some animation on web pages.
- Browser window can be displayed to browse web sites before the capture.
- Final file name of capture is fully configurable.
- Pretty and easy to use program interface.
- Program is free and released under GPL !
- And many more...

WebSwoon is written in Python and uses wxWidgets. Sorry about the package size, but several runtime libraries must be installed with program (they will be installed only in program's installation folder).

The current version of WebSwoon relies on Internet Explorer and thus requires Windows and IE to be installed. It has been tested with IE6 but should run with older versions. If you want to use a proxy, desactivate javascript, or modify any option related to the browser, do it from the usual IE configuration panel. WebSwoon will use them automatically. Everything, except popup windows which are blocked, will run in WebSwoon browser as if you were using IE alone.


• • • • • • •

• Installation

Install the program package in the folder of your choice, then run "WebSwoon" from from the Windows Start menu. A console version of program is also available in the installation folder, use webswoon_console.exe --help for more information.

Advanced users : All configuration information is in the "webswoon.cfg" file and can be tweaked if you respect the file format and letter case (A is not a).

WebSwoon does not read/write anything in Windows registry. Only some Windows system DLLs used indirectly by program/Python may do this, but it shoudn't happen. WebSwoon will survive to a Windows reinstallation if you install it in a safe folder.


• • • • • • •

• Known limitations

Some WebSwoon limits are known, they are not "bugs" :

- Some web sites appear to be longer to capture than others, even if the page seems to be completed. Using the ActiveX IE control makes sometimes hard to know when the page is really fully loaded and displayed. In some rare cases, program has to wait for a delay to expires when no data are received to ensure to get the full page. You can configure this delay as the "Timeout delay" in the configuration, but a too small value may results in captures with missing graphics on some web pages.
- Popup windows from browser window are blocked. However it seems that sometimes some popups are not blocked and can appear when the program switches to another web site.
- Sometimes the browser may show an alert dialog requesting an user choice (from browser itself or from Javascript alerts) These dialogs will block the captures until they are closed manually. Options are available in configuration window to disable these alerts, but it may results in incomplete captures.



• • • • • • •

• Web sites list

The web sites list containing all URLs used to build captures is stored in the file "websites_list.txt". You can edit it by hand, generate it from a database using an appropriate program, or fill it directly from WebSwoon.

Format is simple : one url by line ending with \r\n.


• • • • • • •

• Program's usage

As the web sites captures are done via InternetExplorer ActiveX control, the browser integrated in program is compatible with everything that IE can handle on your system (Javascript, Java, Flash...). It will also use automatically the cookies stored on your IE to access pages requiring them.

You can choose to show or hide the browser window. If you want to make some action on a web site before the capture, you can choose to open the browser window and click on a link to skip the front page for example, or close an ads. The capture will be delayed until you don't do anything.

With the browser window opened, you can also ensure that all captures are done correctly. It can also help you in some cases to find and remove broken web sites in your list.


• • • • • • •

• Configuration window

Program panel :

- Delete all existing captures before to start : allow you to delete all captured images before to start the captures. Warning ! Enabling this option will delete all files in captures folder.

- Update existing captures older than x days x hours x mins : allow you to specify when program must refresh an existing capture. If you want fresh updates, you can set 1 minute and run program in loop-mode.

- Automatically restart captures when finished after xxx minutes : allow you to run program in loop-mode. When all captures are finished, program will wait for this delay and then restart all captures from the begining. You can abort captures using the "stop capture" option in menu or the "abort captures" option in browser window if it is displayed.

- Interface language : allow you to select language used for program interface. Modifying this option will require program to restart.


Browser panel :

- Open browser window during captures : display the browser window and allow some actions by user during captures.

- Start in auto-capture mode : allow you to disable automatic capture by default in browser.

- Browser width/height : specify size of browser window. Remember that a 800x600 screen is still the standard for building web sites.

- Canva width/height : specify size of browser canva in full page mode only. Web page will be loaded in this large blank area and limits of content will be detected automatically.

[this option has been suspended until further notice] - Ignored URLS : specify which addresses must not be loaded and displayed in the browser window, it's used typically to hide adverts. Multiple URLs must be separated with the ";" character.

- Wait delay after complete page : how many time program must wait before to capture and save the image after that the web site is fully loaded. If you want to wait for a Flash animation to start, you can set 2 or 3 seconds.

- Timeout delay : how many time program must wait for a web site to load. After this delay the capture will be done even if datas on the page are not fully loaded.

- Display browser error window (Javascript warnings) : Allow you to enable warning displayed by browser when there are errors or questions.

- Disable Javascript in browser (avoid blocking Javascript alerts) : Allow to disable Javascript in browser to ignore alerts with yes/no choice for example which are blocking captures.


Captures panel :

- Capture method : Standard view / Full page : allow you to choose the capture method. Standard view will save only what is displayed in browser. Full page will the full height of the page which needs to be scrolled to be viewed. The content limits are discovered automatically. The Full page mode is slower as it requires two passes for each web page to find limits of content correctly.

- Resize capture image : allow you to resize capture image to a specified size, often used to generate thumbnails. In full page mode only the width will used, height will be calculated automatically.
- Capture image width/height : specify the size in pixels of the resized capture image.

- Remove window border and scrollbar in captures : allow you to crop the visible content of the browser window to hide the window border (some pixels around the content) and the vertical scrollbar. Program is however unable to detect if scrollbar is really visible or not and will crop the image anyway.

- Keep margins around content of xxx pixels : allow to keep a blank space around the page content in full page capture mode only.

- Capture image format BMP/JPEG/GIF/PNG : specify image format of saved captures.

- Capture file name format : allow you to personnalize the file name of captures, you can use these options that will be replaced with data in the final files names :
%p : protocol of URL (cleaned of incompatible symbols**)
%u : URL (cleaned of incompatible symbols**)
%e : extention of file (jpg, gif or png)
%y : year date in format YYYY
%m : month date in format MM
%d : day date in format DD
%i : number of the current web site (first web site is 1, second is 2,...)
%z : md5 hash of url (32 bytes in hexadecimal)

** All parasite characters \/:?~&=<> and %2C are replaced with the underscore character "_". To avoid some problems on some web servers, .pl and .php are replaced with _pl and _php in the url.

For example, with the file name format "%p%u_%y-%m-%d.%e" (without quotes), if URL is "http://www.yahoo.com", the capture file name will be "http___www.yahoo.com_2004-06-15.jpg"
Note : the "://" characters after "http" have been replaced with 3 underscores "_" to be compatible with file system.

- Save captures in folder : specify where to save captured pictures.


You can use the Default button to restore default settings in all options. Using the Cancel button will restore your previous configuration.


• • • • • • •

• Version history and updates

WebSwoon 1.0 Beta 7 (July-2005)

- Fixed a bug preventing to save urls list in editor.
- Added %z parameters for file names, it will be replaced with md5 hash of url.
- Added a menu option to capture only one web page and to save its capture in a specific folder.
- Added an option to capture the whole page and not only what is visible in browser. Content limits will be discovered automatically. If resize is turned on, the height will be ignored and calculated from specified width to keep proportions. The Full page mode is slower as it requires two passes for each web page to find limits of content correctly.
- Added an option to set canva size for full page mode. Web page will be loaded in this large blank area and limits of content will be detected automatically.
- Added an option to specify margins size around content in full page mode.
- Added a switch option for image resizing. Turning it off will save image in full original size.
- Added BMP format for image output.
- Updated documentation with latest changes in configuration window.
- German language is now available (thanks to Olaf Noehring).


WebSwoon 1.0 Beta 6 (May-2005)

- Program interface can now be translated in any language using gettext method.
- French and English languages are now available.
- New method to determine if page is fully loaded, improving speed of captures and fixing some problems with pages being captured before to be fully loaded.
- Configuration window has been modified to reorder available options.
- Added description instead of numbers for some errors returned by browser.
- Added an option to set JPEG compression level.
- Added an option to set browser auto-capture mode checkbox on/off by default.
- Added an option to display browser errors (javascript warnings). Hiding them may prevent some web sites requiring an user action to be loaded, like https wrong certificate confirmation for example.
- Added an option to disable Javascript (ActiveScripting) in browser to avoid some blocking Javascript alerts during captures (yes/no choices, etc...). Using this option may prevent some web sites to be displayed correctly.
- Now a blank page is loaded between each site to not duplicate the previous capture when a web site is broken.
- Fixed bug with file names containing .pl/.php and generating errors in Apache web server.
- When output format is PNG, pictures are now in 256 colors instead of 24bits, files sizes are smaller.
- Improved GIF pictures quality.
- Documentation has been converted to html for a better readability.


WebSwoon 1.0 Beta 5 (August-2004)

- Fixed a bug blocking program when saving large JPEG pictures.


WebSwoon 1.0 Beta 4 (August-2004)

- Added hours and minutes in refresh delay to allow smaller delay.
- Added an option to activate loop-mode. When all captures are finished, program will restart forever. It must be used with a correct refresh captures delay in options.
- Time-out delay is now activated when a web site starts to load. Capture will be done even if the page failed to load completely before this delay. It should prevent some web sites to block the captures.
- Added a console version of program. Saved configuration will be used but you can override saved parameters by specifying some options on the command line. Use the --help parameter for help. Two versions of the executable are required because the command line is ignored when a Python program is started with the console window hidden, like webswoon.exe for example.
- Switched the Auto-capture mode two-states button to a checkbox for more visibility under some Windows themes.
- Updated documentation with a description for all configuration options.
- Added HTTP error return code in error message (like 404, etc.)


WebSwoon 1.0 Beta 3 (June-2004)

- Fixed height of browser window, some pixels were missing at the bottom.
- Added "Auto-capture mode" button in browser window to switch between auto/manual capture mode. In manual mode, all configured delays are ignored but the button can be pushed again to reactivate them.
- Added "Capure now" button to immediately capture the current web site and jump to next one.
- Added "Skip" button to ignore current web site and jump to next one.
- Added "Abort captures" button to cancel all captures, equivalent to close the browser window.
- Captures can now be resumed without restarting program when they were aborted.
- URL of current web site is now displayed in browser window.
- Added status bar in browser window displaying IE messages and a progress bar.
- Added %i parameter in file name format, it will be replaced with the number of the current web site (first web site is 1, second is 2,...)


WebSwoon 1.0 Beta 2 (June-2004)

- Added a manager with list of URLs to browse and modify web sites addresses directly from program.
- Added a menu option to import web sites list from a text file.
- Added ignored URLs field to prevent browser to load some ads and to prevent SideBar feature on some web sites to replace main page content.
- Added an option to delete existing captures on startup.
- Added an option to remove border and scrollbar from captures.
- Added an expire time to refresh old existing captures.
- Added file name format field to personnalize file name of captures.
- Added PNG and GIF image format for captures.
- Folder in which to store captures can now be selected.
- Added a configurable timeout delay when it's impossible to determine if a page is complete or not.
- Reviewed placement of options in configuration window.
- Greatly improved JPEG quality in captures.
- Fixed captures width, some pixels were missing on the right side.
- Fixed a memory bug blocking program after several hundreds of captures.
- Improved wait time when browsing a web site before to do the capture.
- Added counter in statusbar to have some information about progression.
- Fixed timers, delay selected in configuration was extended of one second.


WebSwoon 1.0 Beta 1 (April-2004)

- First public beta release, let's try it !


• • • • • • •

• Program translation information

If you are interested in translating WebSwoon in another language, you can use the nice POEdit program to build your own catalog. You can find the reference english catalog file in this folder :
[webswoon installation folder]\locale\en\LC_MESSAGES\webswoon\webswoon.po

You can test your own built webswoon.mo file by replacing the existing webswoon.mo file in a language folder and by selecting it in the program options. When it's working, you can send your catalog by email so it can be added in next release. Remember that original english texts must remain against your translation in the catalog or it will not work.

To update an existing catalog, open it and update it using the english reference webswoon.pot file.


End of documentation.