Snapcrawl - crawl a website and take screenshots

Snapcrawl is a command line utility for crawling a website and saving screenshots.

Features

Crawls a website to any given depth and save screenshots
Can capture the full length of the page
Can use a specific resolution for screenshots
Skips capturing if the screenshot was already saved recently
Uses local caching to avoid expensive crawl operations if not needed
Reports broken links

Install

$ gem install snapcrawl

Usage

$ snapcrawl --help

Snapcrawl

Usage:
  snapcrawl go <url> [options]
  snapcrawl -h | --help 
  snapcrawl -v | --version

Options:
  -f --folder <path>     Where to save screenshots [default: snaps]
  -a --age <n>           Number of seconds to consider screenshots fresh
                         [default: 86400]
  -d --depth <n>         Number of levels to crawl [default: 1]
  -W --width <n>         Screen width in pixels [default: 1280]
  -H --height <n>        Screen height in pixels. Use 0 to capture the full 
                         page [default: 0]
  -s --selector <s>      CSS selector to capture
  -o --only <regex>      Include only URLs that match <regex>
  -h --help              Show this screen
  -v --version           Show version

Examples:
  snapcrawl go example.com
  snapcrawl go example.com -d2 -fscreens
  snapcrawl go example.com -d2 > out.txt 2> err.txt &
  snapcrawl go example.com -W360 -H480
  snapcrawl go example.com --selector "#main-content"
  snapcrawl go example.com --only "products|collections"

Notes

If a URL cannot be found, Snapcrawl will report to stderr. You can create a report by running snapcrawl go example.com 2> err.txt

Todo

[x] Tests (probably against some ad hoc sinatra)
[ ] Make the test server start/stop automatically when testing
[ ] Move ignored file extensions and mailto/tel links to config
[ ] Add screen size presets (also to user-overridable config)