Home / Find dead links on your website

This post describe how you can check for dead links on your website. I used the Linux command wget for this.

WORK IN PROGRESS
I created my website using docpad. I have it locally running as described in this post. I want to check if I have any dead links on this site. The wget command in Linux helps me do this.

  1. The 'spider' option is to enable crawling recursively (--recursive option).
  2. The --base option specifies a base url for the link on my website that are just file names such as <a href="index.html">Home</a> instead of fully qualified URLs.
wget --spider --recursive --level=1 --force-html --input-file="out/index.html" --base="http://localhost:9778/" -Dlocalhost --delete-after --no-cache

TODO: Right now this command gives me a lot of output. I do find 404 messages and the broken links, but I need to scroll through the output. One solution is to provice the -o option to have the output routed to a file and then run a grep command to search for 404 errors.

A related task is to find all references to a file in other files. I can do this to get only the files names of other files have a link to 'index.html'

grep -H "index.html" out/* | cut -d: -f1