HTTrack documentation

You have to know that once you have defined starts links, the default mode will mirror these links - i.e. if one of your start page is www.myweb.com/test/index.html, all links starting with www.myweb.com/test/ will be accepted. But links directly in www.myweb.com/.. will not be accepted, however, because they are in a higher strcuture. This prevent HTTrack from mirroring the whole site. (All files in structure levels equal or lower than the primary links will be retreived.)
You can refuse some files with filters, as we will see below.

Filters are analyzed by HTTrack from the first filter to the last one. The complete URL name is compared to filters defined by the user of added automatically by HTTrack.
For every filter; if the link is recognized, and if a '+' was typed before the filter, the link is considered as "accepted" If '-' was defined and the link is recognized, the links is considered as "forbidden". Every new status overrides the last one: hierarchy is important. If no status could be defined, HTTrack decides himself what to do by analyzing the link (upper/lower structure, and so on..)

Here are some examples for filters: (that can be generated automatically using the interface)

`www.thisweb.com*`	This will refuse/accept this web site (all links located in it will be rejected)
`.com/`	This will refuse/accept all links that contains .com in them
`cgi-bin`	This will refuse/accept all links that contains cgi-bin in them
`www..com/[path].zip`	This will refuse/accept all zip files in .com addresses
`myweb/.tar`	This will refuse/accept all tar (or tar.gz etc.) files in hosts containing myweb
`/mypage*`	This will refuse/accept all links containing mypage (but not in the address)
`*.html`	This will refuse/accept all html files. Warning! With this filter you will accept ALL html files, even those in other addresses. (causing a global (!) web mirror..) Use www.myweb.com/*.html to accept all html files from a web.
`.html[]`	Identical to `*.html`, but the link must not have any supplemental characters at the end (links with parameters, like `www.myweb.com/index.html?page=10`, will be refused)

Special jokers can be used for specific characters as you have seen: (*[..])

`*`	any characters (the most commonly used)
`[file] or [name]`	any filename or name, e.g. not /,? and ; characters
`*[path]`	any path (and filename), e.g. not ? and ; characters
`*[a,z,e,r,t,y]`	any letters among a,z,e,r,t,y
`*[a-z]`	any letters
`*[0-9,a,z,e,r,t,y]`	any characters among 0..9 and a,z,e,r,t,y
`*[]`	no characters must be present after

I- Quick start (Windows release)

IIb- FAQ (WinHTTrack and HTTrack)

II- How to use HTTrack (the command-line version)

IIb- Example: Use of HTTrack (the command-line version)

⌐1998 Xavier Roche & Yann Philippot
Comments, questions, problems and bugs report are welcome, for the shell and for the robot.

I- Quick start (Windows release)

IIb- FAQ (WinHTTrack and HTTrack)

II- How to use HTTrack (the command-line version)

IIb- Example: Use of HTTrack (the command-line version)

⌐1998 Xavier Roche & Yann Philippot Comments, questions, problems and bugs report are welcome, for the shell and for the robot.

⌐1998 Xavier Roche & Yann Philippot
Comments, questions, problems and bugs report are welcome, for the shell and for the robot.