Build Status

This is a spellchecker that recursively fetches HTML pages, converts them to plain text (using pandoc), and spellchecks them with hunspell. Unknown words will be printed to stdout, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.

Words that are not in the dictionary for the given language (inferred from the lang attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.


  • The following command will retrieve the HTML document at, spellcheck it, and not print anything because there are no errors:
  $ httpspell

The exit code is 0.

  • The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to en_US because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant like en_US:
  $ httpspell --language en_US

The exit code is 1.

What is not checked

  • When spidering a site, httpspell will skip all responses with a content-type header other than text/html (unless pointing it to file, in which case it accepts anything).
  • Before converting, httpspell removes the following nodes from the HTML DOM as they are not a good target for spellchecking:
    • code
    • pre
    • Elements with spellcheck='false' (this is how HTML5 allows tagging elements as a being target for spellchecking or not)


If you produce content with kramdown (e.g. using Jekyll), setting spellcheck='false' for an element is a simple as adding this line after the element (e.g. heading):

{: spellcheck="false"}