Blinkr

A broken page and link checker for websites. Optionally uses phantomjs to render pages to check resource loading, links created by JS, and report any JS page load errors.

typhoeus, which can execute up to 200 parallel requests, and cache the results is used to check links.

Installation

Add this line to your application's Gemfile:

gem 'blinkr'

And then execute:

$ bundle

Or install it yourself as:

$ gem install blinkr

If you wish to use phantomjs, install phantomjs for your platform http://phantomjs.org/download.html

Usage

Blinkr determines which pages to load from your sitemap.xml. To run blinkr against your site checking every a[href] link on all your pages:

blinkr -u http://www.jboss.org

If you want to customize blinkr, create a config file blinkr.yaml. For example:

# Links and pages not to check (may be a regexp or a string)
skips:
  - !ruby/regexp /^\/video\/((?!91710755).)*\/$/
  - !ruby/regexp /^\/quickstarts\/((?!eap\/kitchensink).)*\/*$/
  - !ruby/regexp /^\/boms\/((?!eap\/jboss-javaee-6_0).)*\/*$/
  - !ruby/regexp /^\/archetypes\/((?!eap\/jboss-javaee6-webapp-archetype).)*\/*$/

# Errors to ignore when generating the output. Each ignore should be a hash
# containing a url (may be regexp or a string), an error code (integer) and a
# error message (may be a regexp or a string)
ignores:
  - url: http://www.acme.com/foo
    message: Not Found
  - url: !ruby/regexp /^https?:\/\/(www\.)?acme\.com\/bar\/
    code: 500

# The output file to write the report to
report: _tmp/blinkr.html

# The URL to check (often specificed on the command line)
base_url: http://www.jboss.org

# Specify the URL to the sitemap to use, rather than the default <base_url>/sitemap.xml
sitemap: http://www.jboss.org/my_sitemap.xml

# Specify the 'browser' used to load each page from the sitemap. By default 
# typhoeus is used, which will fetch the sources of each page in parallel 
# (fast). 
# Alternatively, you can use phantomjs, which will process the javascript and
# CSS. This allows any links generated by javascript as well as any resources
# loaded by the page/javascript to be checked. Additionally, any JS errors are
# reported. To use phantomjs, you must make sure the native binary is available
# on your path.
browser:phantomjs

# The number of times to try reloading a link, if the server doesn't respond or
# refuses the connection. If the retry limit is exceeded, it will be reported as
# 'Server timed out' in the report. By default 3.
max_retrys: 3

# The number times to try reloading a page. You may want to increase this if you
# find errors in the console that a page cannot be loaded
max_page_retrys: 3

# Allows blinkr to ignore fragments (#foo) which can reduce the number of URLs
# to check. By default false.
ignore_fragments: true

# Control the number of threads used to run phantomjs. By default 8.
phantomjs_threads: 8

# Export the report to phantomjs

You can specify a custom config file on the command link:

blinkr -c my_blinkr.yaml

If you want to see more details about the URLs blinkr is checking, you can use the -v option:

blinkr -u http://www.jboss.org -v

If you need to debug why a particular URL is being reported as bad using blinkr, but works in your web browser, you can load a single URL using typhoeus:

blinkr -c my_blinkr.yaml -s http://www.acme.com/corp

Additionally, you can specify the -w option to tell libcurl to run in verbose mode (this is very verbose, so normally used with -s):

blinkr -c my_blinkr.yaml -s http://www.acme.com/corp -v

Extending Blinkr

Blinkr is based around a pipeline. Issues with the pages are collected, analysed, and then passed to the report for transformation and rendering. Additional sections may appended to the report.

To add extensions to blinkr, you need to define a custom pipeline. The pipeline is defined in a ruby file (e.g. blinkr.rb)

require 'acme/spellcheck'

Blinkr::Extensions::Pipeline.new do |config|
  # define the default extensions
  extension Blinkr::Extensions::Links.new config
  extension Blinkr::Extensions::JavaScript.new config
  extension Blinkr::Extensions::Resources.new config

  # define custom extensions
  extension ACME::Extensions::SpellCheck.new config
end

NOTE: You must add the default extensions to a custom pipeline, for them to be executed.

The pipeline is defined in blinkr.yaml:

# Use a custom pipeline
pipeline: blinkr.rb

An extension is just a standard Ruby class. It should declare an initialize(config) method, and may declare one or more of:

  • collect(page)
  • analyze(context, typhoeus)
  • transform(page, error, default_html)
  • append(context)

Each method is called as the pipeline progresses. Arguments passed are:

  • page - a object containing the tyhpoeus response, the page body (as a Nokogiri HTML document), an array of errors for the page, any resource_errors which ocurred when the page was loaded, and any javascript_errors which ocurred when the page was loaded
  • context - a map of url => pages which are being analysed. After the analyze phase, and before the transform phase, any pages with no errors are removed from the context
  • typhoeus - a wrapper around typhoeus, defining a process method and a process_all method, both of which take a url and a retry limit, and accept a block to execute when a response is returned.
  • error - an individual error, consisting of a type, a url, a title, a code, a message, a detail, a snippet and an fontawesome icon class
  • default_html - the default HTML used to display the error

transform should return the HTML used to display the error. append should return any HTML to be appended to the report. A templating language, such as slim or haml may be used to generate the HTML.

The build extensions, in lib/blinkr/extensions are good examples of how extensions can perform broken link analysis, or collect and format resource loading and javascript execution errors.

Contributing

  1. Fork it ( http://github.com/pmuir/blinkr/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request