Site Checker is a simple ruby gem, which helps you check the integrity of your website by recursively visiting the referenced pages and images. I use it in my test environments to make sure that my websites don't have any dead links.
gem install site_checker
In Test Code
First, you have to load the
site_checker by adding this line to the file where you would like to use it:
If you want to use it for testing, the line should goto the
The usage is quite simple:
check_site("http://localhost:3000/app", "http://localhost:3000") puts collected_remote_pages.inspect puts collected_local_pages.inspect puts collected_remote_images.inspect puts collected_local_images.inspect puts collected_problems.inspect
The snippet above will open the
http://localhost:3000/app link and will look for links and images. If it finds a link to a local page, it will recursively checkout out that page, too. The second argument -
http://localhost:3000 - defines the starting reference of your website.
In case you don't want to use a DSL like API you can still do the following:
.("http://localhost:3000/app", "http://localhost:3000") puts ..inspect puts ..inspect puts ..inspect puts ..inspect puts ..inspect
Using on Generated Content
If you have a static website (e.g. generated by octopress) you can tell
site_checker to use folders from the file system. With this approach, you don't need a webserver for verifying your website:
check_site("./public", "./public") puts collected_problems.inspect
You can instruct
site_checker to ignore certain links:
. do |config| config.ignore_list = ["/", "/atom.xml"] end
By default it won't check the conditions of the remote links and images - e.g. 404 or 500 -, but you can change it like this:
. do |config| config.visit_references = true end
Too deep recursive calls may be expensive, so you can configure the maximum depth of the recursion with the following attribute:
. do |config| config.max_recursion_depth = 3 end
Make sure that there are no local dead links on the website (I'm using rspec syntax):
before(:each) do . do |config| config.ignore_list = ["/atom.xml", "/rss"] end end it "should not have dead local links" do check_site("http://localhost:3000", "http://localhost:3000") # this will print out the difference and I don't have to re-run with print collected_problems.should be_empty end
Check that all the local pages can be reached with maximum two steps:
before(:each) do . do |config| config.ignore_list = ["/atom.xml", "/rss"] config.max_recursion_depth = 2 end @number_of_local_pages = 100 end it "all the local pages have to be visited" do check_site("http://localhost:3000", "http://localhost:3000") collected_local_pages.size.should eq @number_of_local_pages end
From version 0.3.0 the site checker can be used from the command line as well. Here is the list of the available options:
~ % site_checker -h Visits the <site_url> and prints out the list of those URLs which cannot be found Usage: site_checker [options] <site_url> -e, --visit-external-references Visit external references (may take a bit longer) -m, --max-recursion-depth N Set the depth of the recursion -r, --root URL The root URL of the path -i, --ignore URL Ignore the provided URL (can be applied several times) -p, --print-local-pages Prints the list of the URLs of the collected local pages -x, --print-remote-pages Prints the list of the URLs of the collected remote pages -y, --print-local-images Prints the list of the URLs of the collected local images -z, --print-remote-images Prints the list of the URLs of the collected remote images -h, --help Show a short description and this message -v, --version Show version
undefined method 'new' for SiteChecker:Module
This error occurs when the test code calls v0.1.1 methods, but a newer version of the gem has already been installed. Update your test code following the examples above.
Copyright (c) 2012 Zsolt Fabok and Contributors. See LICENSE for details.