Site Checker Code Climate Build Status Dependency Status

Site Checker is a simple ruby gem, which helps you check the integrity of your website by recursively visiting the referenced pages and images. I use it in my test environments to make sure that my websites don’t have any dead links.

Install

gem install site_checker

Usage

In Test Code

First, you have to load the site_checker by adding this line to the file where you would like to use it:

require 'site_checker'

If you want to use it for testing, the line should goto the test_helper.rb.

The usage is quite simple:

check_site("http://localhost:3000/app", "http://localhost:3000")
puts collected_remote_pages.inspect
puts collected_local_pages.inspect
puts collected_remote_images.inspect
puts collected_local_images.inspect
puts collected_problems.inspect

The snippet above will open the http://localhost:3000/app link and will look for links and images. If it finds a link to a local page, it will recursively checkout out that page, too. The second argument - http://localhost:3000 - defines the starting reference of your website.

In case you don’t want to use a DSL like API you can still do the following:

SiteChecker.check("http://localhost:3000/app", "http://localhost:3000")
puts SiteChecker.remote_pages.inspect
puts SiteChecker.local_pages.inspect
puts SiteChecker.remote_images.inspect
puts SiteChecker.local_images.inspect
puts SiteChecker.problems.inspect
Using on Generated Content

If you have a static website (e.g. generated by octopress) you can tell site_checker to use folders from the file system. With this approach, you don’t need a webserver for verifying your website:

check_site("./public", "./public")
puts collected_problems.inspect
Configuration

You can instruct site_checker to ignore certain links:

SiteChecker.configure do |config|
  config.ignore_list = ["/", "/atom.xml"]
end

By default it won’t check the conditions of the remote links and images - e.g. 404 or 500 -, but you can change it like this:

SiteChecker.configure do |config|
  config.visit_references = true
end

Too deep recursive calls may be expensive, so you can configure the maximum depth of the recursion with the following attribute:

SiteChecker.configure do |config|
  config.max_recursion_depth = 3
end
Examples

Make sure that there are no local dead links on the website (I’m using rspec syntax):

before(:each) do
  SiteChecker.configure do |config|
    config.ignore_list = ["/atom.xml", "/rss"]
  end
end

it "should not have dead local links" do
  check_site("http://localhost:3000", "http://localhost:3000")
  # this will print out the difference and I don't have to re-run with print
  collected_problems.should be_empty
end

Check that all the local pages can be reached with maximum two steps:

before(:each) do
  SiteChecker.configure do |config|
    config.ignore_list = ["/atom.xml", "/rss"]
    config.max_recursion_depth = 2
  end

  @number_of_local_pages = 100
end

it "all the local pages have to be visited" do
  check_site("http://localhost:3000", "http://localhost:3000")
  collected_local_pages.size.should eq @number_of_local_pages
end

Command line

From version 0.3.0 the site checker can be used from the command line as well. Here is the list of the available options:

~ % site_checker -h
Visits the <site_url> and prints out the list of those URLs which cannot be found

Usage: site_checker [options] <site_url>
-e, --visit-external-references  Visit external references (may take a bit longer)
-m, --max-recursion-depth N      Set the depth of the recursion
-r, --root URL                   The root URL of the path
-i, --ignore URL                 Ignore the provided URL (can be applied several times)
-p, --print-local-pages          Prints the list of the URLs of the collected local pages
-x, --print-remote-pages         Prints the list of the URLs of the collected remote pages
-y, --print-local-images         Prints the list of the URLs of the collected local images
-z, --print-remote-images        Prints the list of the URLs of the collected remote images
-h, --help                       Show a short description and this message
-v, --version                    Show version

Troubleshooting

undefined method ‘new’ for SiteChecker:Module

This error occurs when the test code calls v0.1.1 methods, but a newer version of the gem has already been installed. Update your test code following the examples above.

Copyright (c) 2012 Zsolt Fabok and Contributors. See LICENSE for details.