On May 21, 2005, at 6:45 AM, Gerhard Killesreiter wrote:

On Drupal.org there is a small hint about using wget for this.


This is from my upgrade tutorial.  http://dev.bryght.com/t/wiki/CivicSpaceUpgrade
I added info about crawling with three types of account, setting images if that module is enabled.  I also advised to check web server logs not only watchdog logs.

Cheers,
Kieran

2.5 Run a link check on the site.

We now want to crawl your site, to get a list of dead links. This is a simple command that will help you identify broken parts of your site. You will see errors in you Administrator logs as well as your web server error logs. Perform these next commands through the administrator menu on your site.

Enable image admin module Set permissions so images can be written for userid running link check. Enable the menu module, and then disable the logout menu.

Run wget with the following options.

The cookie this command is referring to is the cookie downloaded from your browser when you last logged into your CivicSpace site as an admin.

NOTE: You can see the full and original wget instructions here: http://drupal.org/node/11521

wget -r --delete-after --cookies=off --header='Cookie: PHPSESSID=xxx' http://yoursite.com

-r to recursively crawl the site. --delete-after, --cookies=off tells the crawler not to use a cookie. --header=’Cookie: PHPSESSID=XXXX’ tells the site that session information will be passed in the http header. This should be repeated 2 more times; once for a regular userid, and once for an anonymous userid. After the link check is done check your admin logs and your webserver error logs.