Simple little command-line web crawler. Use it to find 404's or 500's on your site. I use it to warm caches. Multi-threaded enabled for faster crawling.
gem install krawler
From the command line:
$ krawl http://localhost:3000/
-e, --exclude regex Exclude matching paths -s, --sub-restrict Restrict to sub paths of base url -c, --concurrent count Crawl with count number of concurrent connections -r, --randomize Randomize crawl path
Restrict crawling to sub-paths of /public
$ krawl http://localhost:3000/public -s
Restrict crawling to paths that do not match
$ krawl http://localhost:3000/ -e "^\/api\/"
Crawl with 4 current threaded crawlers. Make sure your server is capable of handling concurrent requests.
$ krawl http://production.server -c 4
Randomize the crawl path. Helpful when you have a lot of links and get bored watching the same crawl path over and over.
$ krawl http://localhost:3000/ -r
- Fork it
- Create your feature branch (
git checkout -b my-new-feature)
- Commit your changes (
git commit -am 'Added some feature')
- Push to the branch (
git push origin my-new-feature)
- Create new Pull Request