HTML::Proofer

If you generate HTML files, then this tool might be for you.

HTML::Proofer is a set of tests to validate your HTML output. These tests check if your image references are legitimate, if they have alt tags, if your internal links are working, and so on. It’s intended to be an all-in-one checker for your output.

Build Status Gem Version

Installation

Add this line to your application’s Gemfile:

gem 'html-proofer'

And then execute:

$ bundle

Or install it yourself as:

$ gem install html-proofer

NOTE: When installation speed matters, set NOKOGIRI_USE_SYSTEM_LIBRARIES to true in your environment. This is useful for increasing the speed of your Continuous Integration builds.

Usage

Using in a script

Require the gem; generate some HTML; create a new instance of the HTML::Proofer on your output folder; then run it. Here’s a simple example:

“by require ‘html/proofer’ require ‘html/pipeline’ require ‘find’

make an out dir

Dir.mkdir(“out”) unless File.exists?(“out”)

pipeline = HTML::Pipeline.new [HTML::Pipeline::MarkdownFilter, HTML::Pipeline::TableOfContentsFilter], :gfm => true

iterate over files, and generate HTML from Markdown

Find.find(“./docs”) do |path| if File.extname(path) == “.md” contents = File.read(path) result = pipeline.call(contents)

File.open("out/#{path.split("/").pop.sub('.md', '.html')}", 'w') { |file| file.write(result[:output].to_s) }

end end

test your out dir!

HTML::Proofer.new(“./out”).run

Using on the command-line

You’ll get a new program called htmlproof with this gem. Terrific!

Use it like you’d expect to:

“ash htmlproof ./out –swap wow:cow,mow:doh –ext .html.erb –ignore www.github.com

Note: since swap is a bit special, you’ll pass in a pair of RegEx:String values. htmlproof will figure out what you mean.

Using with Jekyll

Want to use HTML Proofer with your Jekyll site? Awesome. Simply add gem 'html-proofer' to your Gemfile as described above, and add the following to your Rakefile, using rake test to execute:

“by require ‘html/proofer’

task :test do sh “bundle exec jekyll build” HTML::Proofer.new(“./_site”).run end

Don’t have or want a Rakefile? You could also do something like the following:

“sh htmlproof ./_site

Real-life examples

ProjectRepository
Raspberry Pi documentationraspberrypi/documentation
Open Whisper Systems websiteWhisperSystems/whispersystems.org
Jekyll websitejekyll/jekyll

What’s Tested?

Images

img elements:

  • Whether all your images have alt tags
  • Whether your internal image references are not broken
  • Whether external images are showing

a, link elements:

  • Whether your internal links are not broken; this includes hash references (#linkToMe)
  • Whether external links are working

Scripts

script elements:

  • Whether your internal script references are not broken
  • Whether external scripts are loading

Configuration

The HTML::Proofer constructor takes an optional hash of additional options:

OptionDescriptionDefault
disable_externalIf true, does not run the external link checker, which can take a lot of time.false
extThe extension of your HTML files including the dot..html
faviconEnables the favicon checker.false
followlocationFollows external redirections. Amends missing trailing slashes to internal directories.true
directory_index_fileSets the file to look for when a link refers to a directory.index.html
href_ignoreAn array of Strings or RegExps containing hrefs that are safe to ignore. Certain URIs, like mailto and tel, are always ignored.[]
alt_ignoreAn array of Strings or RegExps containing imgs whose missing alt tags are safe to ignore.[]
href_swapA hash containing key-value pairs of RegExp => String. It transforms links that match RegExp into String via gsub.{}
verboseIf true, outputs extra information as the checking happens. Useful for debugging.false
only_4xxOnly reports errors for links that fall within the 4xx status code range.false

Configuring Typhoeus

You can also pass in any of Typhoeus’ options for the external link check. For example:

“uby HTML::Proofer.new(“out/”, )

This sets HTML::Proofer‘s extensions to use .htm, and gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check the Typhoeus documentation for more information on what options it can receive.

Configuring Parallel

Parallel is being used to speed things up a bit. You can pass in any of its options with the options “namespace” :parallel. For example:

“uby HTML::Proofer.new(“out/”, })

:in_processes => 3 will be passed into Parallel as a configuration option.

Instead of a directory as the first argument, you can also pass in an array of links:

“uby HTML::Proofer.new([http://github.com, http://jekyllrb.com])

This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you’ll need to pass a special --as-links argument:

“ash bin/htmlproof www.google.com,www.github.com –as-links

Ignoring content

Add the data-proofer-ignore attribute to any tag to ignore it from the checks.

Custom tests

Want to write your own test? Sure! Just create two classes–one that inherits from HTML::Proofer::Checkable, and another that inherits from HTML::Proofer::Checks::Check. Checkable defines various helper methods for your test, while Checks::Check actually runs across your content. Checks::Check should call self.add_issue on failures, to add them to the list.

Here’s an example custom test that protects against mailto links:

“uby class OctocatLink < ::HTML::Proofer::Checkable

def mailto? return false if @data_ignore_proofer || @href.nil? || @href.empty? return @href.match /^mailto:/ end

def octocat? return @href.match /:[email protected]\Z/ end

end

class MailToOctocat < ::HTML::Proofer::Checks::Check

def run @html.css(‘a’).each do |l| link = OctocatLink.new l, “octocat_link”, self

  if link.mailto? && link.octocat?
    return self.add_issue("Don't email the Octocat directly!")
  end
end

end end