(Short) Documentation |
Just follow the steps:
Launch WinHTTrack, choose an option (Mirror sites, Mirror with wizard [ie semi automatic mode], and Get separated files).
Enter URLs (i.e. Internet adresses, suck as
www.test.fr/~bob/) in the URL list.
Optionally, click to the Filters.. button to
define filters for links.
Optionally, you can specify a limited link depth (if not, the entire site will be mirrored ; e.g. www.test.abs/~mike/ will mirror all Mike's site). You can also specify a proxy (ask your administrator). Do not forget the paths for mirror files (the files retreived) and log files (files indicating errors or actions done)
Click to the NEXT-> button. You can start the mirror by clicking START or define a lot of options.
Tip: You can enter more than one URL, by pressing Control-Enter after each line.
This will mirror several sites together.
Options: Many options can be defined (maximum file size, site size,
building option, timeout etc etc.)
Proxy: Set the proxy field if you want to use it (ask
your internet provider if you do not know the proxy name/or the proxy port)
Filters: By clicking this button, you will be able to define
filters. You can user the "Exclude links" and "Accept links" buttons
under Windows
![]() |
![]() |
Note: Filters are analyzed in the order you have defined them. E.g.
if you accept all files from a domain, and after you forbide all gif files, gif files from
the first domain will be forbidden. If after the two former filters you define a third
filter accepting all filenames 'mydraw.gif', gif files from the first domain will be
forbidden except 'mydraw.gif' files. Remember that the order you define filters is
important. Besides, filters you define overrides several options like travel options.
More details about filters are described below if you want to control
precisely the filters possibilities (if not, jump this section):
You have to know that once you have defined
starts links, the default mode will mirror these links - i.e. if one of your start page is
www.myweb.com/test/index.html, all links starting with www.myweb.com/test/ will be
accepted. But links directly in www.myweb.com/.. will not be accepted, however, because
they are in a higher strcuture. This prevent HTTrack from mirroring the whole site. (All
files in structure levels equal or lower than the primary links will be retreived.)
|
Tip: To use WinHTTrack as a spider (for checking links), just set the scan mode as
"Just scan", mark the boxes "Log files" and "Test all links"
and unmark the "Cache" box.
Use combination of all options to have different results.
Tip: In case of troubles/problems during transfer, you can have a look at
the hts-err.txt (and hts-log.txt) file to see what happened. These log files report all
events that may be useful to detect a problem.
Troubleshooting:
When I use HTTrack, nothing is mirrored (no files) What's happening?
Some pages can't be seen, or are displayed with errors!
HTTrack is being idle for a long time without transfering. Whant's
happening?
I am behind a firewall. What can I do?
HTTrack has crashed during a mirror, what's happening?
Questions concerning a mirror:
I want to mirror a Web site, but there are some files outside the
domain, too. How to retreive them?
I have forgotten some URLs of files during a long mirror.. Should I redo
all?
I just want to retreive all ZIP files or other files in a web site/in a
page. How to do it?
There are ZIP files in a page, but I don't want to transfer them. How to do?
I don't want to load gif files.. but what may happen if I watch the page?
I get all types of files on a web site, but I didn't select them on
filters!
When I use filters, I get too many files!
When I use filters, I can't access another domain, but I have filtered it!
Must I add a '+' or '-' in the filter list when I want to use
filters?
I want to find file(s) in a web-site. How to do?
Troubleshooting:
Q: When I use HTTrack, nothing is mirrored (no files) What's
happening?
A: First, be sure that the URL typed is correct. Then, check if you need to use a
proxy server (see proxy options in WinHTTrack or the -P proxy:port option in the
command line program). The site you want to mirror may only accept certain browsers. You
can change your "browser identity" with the Browser ID option in the OPTION box.
Finally, you can have a look at the hts-err.txt (and hts-log.txt) file to see what
happened.
Q: Some pages can't be seen, or are displayed with errors!
A: Some pages may include javascript or java files that are not recognized. For
example, generated filenames. There may be transfer problems, too (broken pipe, etc.). But
most mirrors do work. We still are working to improve the mirror quality of HTTrack.
Q: HTTrack is being idle for a long time without
transfering. Whant's happening?
A: Maybe you try to reach some very slow sites. Try a lower TimeOut value (see
options, or -Txx option in the command line program). Note that you will abandon
the entire site (except if the option is unchecked) if a timeout happen You can, with the
Shell version, skip some slow files, too.
Q: I am behind a firewall. What can I do?
A: You need to use a proxy, too. Ask your administrator to know the proxy server's
name/port. Then, use the proxy field in HTTrack or use the -P proxy:port option
in the command line program.
Q: HTTrack has crashed during a mirror, what's happening?
A: We are trying to avoid bugs and problems so that the program can be as reliable as
possible. But we can not be infallible. If you occurs a bug, please check if you have the
latest release of HTTrack, and send us an email with a detailed description of your
problem (OS type, addresses concerned, crash description, and everything you deem to be
necessary). This may help the other users too.
Retreive options:
Q: I want to mirror a Web site, but there are some files outside
the domain, too. How to retreive them?
A: If you just want to retreive files that can be reached through links, just activate
the 'get file near links' option. But if you want to retreive html pages too, you can both
use wildcards or explicit addresses ; e.g. add www.myweb.com/* to accept all
files and pages from www.myweb.com.
Q: I have forgotten some URLs of files during a long
mirror.. Should I redo all?
A: No, if you have kept the 'cache' files (in hts-cache), cached files will not be
retransfered.
Q: I just want to retreive all ZIP files or other files in a web
site/in a page. How to do it?
A: You can use different methods. You can use the 'get files near a link' option if
files are in an outside domain. You can use, too, a filter adress: adding -* +*.zip
in the URL list (or in the filter list) will accept all ZIP files, even if these files are
outside the address. Example : www.myweb.com/myaddress.html -* +*.zip will allow
you to retreive all zip files on the site.
Q: There are ZIP files in a page, but I don't want to transfer
them. How to do?
A: Just filter them: add -*.zip in the filter list.
Q: I don't want to load gif files.. but what may happen if I
watch the page?
A: If you have filtered gif files (-*.gif), links to gif files will be
rebuild so that your browser can find them on the server.
Q: I get all types of files on a web site, but I didn't select
them on filters!
A: By default, HTTrack retreives all types of files on authorized links. To avoir
that, define filters like -* +<website>/*.html
+<website>/*.htm +<website>/ +*.<type wanted>
Q: When I use filters, I get too many files!
A: You are using too large filters, for example *.html will get ALL html
files identified. If you want to get all files on an address, use www.<address>/*.html.
There are lots of possibilities using filters.
Q: When I use filters, I can't access another domain, but I
have filtered it!
A: You may have done a mistake declaring filters, for example +www.myweb.com/*
-*myweb* will not work, because -*myweb* has an upper priority (because it has
been declared after +www.myweb.com)
Q: Must I add a '+' or '-' in the filter list when I want
to use filters?
A: YES. '+' is for accepting links and '-' to avoid them. If you forget it, HTTrack
will consider that you want to accept a filter if there is a joker in the syntax - e.g.
+<filter> if identical to <filter> if <filter> contains a joker (*)
(else it will be considered as a normal link to mirror)
Q: I want to find file(s) in a web-site. How to do?
A: You can use the filters: forbid all files (add a -* in the
filter list) and accept only html files and the file(s) you want to retreive (BUT do not
forget to add +<website>*.html in the filter list, or pages will not be
scanned! Add the name of files you want with a */ before ; i.e. if you want to
retreive file.zip, add */file.zip)
The command-line program is available for many systems (PC, Linux PC, Sun Solais, AIX) and allows you to control the robot through a command-line. This can be useful for an automatic mirror of a web site.
You are a webmaster, and you would like to make a mirror of a web-site:
Every week (or every day), you can launch (ex: crontab):
httrack --update www.myweb.abc -O /public_html/,/home/root/ |
This will maintain an up-to-date web site into your host.
You are a simple user, and you would like to make a mirror of a web-site for your
own:
Just type:
httrack www.myweb.abc |
When you want to update it, just launch: httrack --update and httrack will
automatically update it.
You want to check links in a site/web page :
Just type:
httrack www.myweb.abc --spider |
And look at the file hts-err.txt : all errors will be reported here.