The drawback of following the relative links solely is that humans often tend to mix them with absolute links to the very same host, and the very same page. In this mode (which is the default mode for following links) all URLs that refer to the same host will be retrieved.
The problem with this option are the aliases of the hosts and domains.
Thus there is no way for Wget to know that `regoc.srce.hr' and
`www.srce.hr' are the same host, or that `fly.cc.fer.hr' is
the same as `fly.cc.etf.hr'. Whenever an absolute link is
encountered, the host is DNS-looked-up with gethostbyname
to
check whether we are maybe dealing with the same hosts. Although the
results of gethostbyname
are hashed, it is still a great slowdown,
e.g. when dealing with large indices of homepages on different hosts
(because each of the hosts must be looked-up and DNS-resolved to
see whether it might be the starting host).
To avoid the overhead you may use `-nh', which will turn off DNS-resolving and make Wget compare hosts litterally. This will make things run much faster, but also much less reliable.
Go to the first, previous, next, last section, table of contents.