A distributed web crawler written in ruby, backed by Redis This project has been presented to the RubyDay2013


  • Easy to use
  • Distributed and scalable
  • It uses a smart/fast and space-efficient probabilistic data structure to determine if an url should be visited or not
  • It doesn't exaust your Redis server
  • Play nicely with MongoDB even if it is not strictly required
  • Easy to write your own page storage strategy
  • Focus crawling made easy
  • Heavily inspired to Anemone

Supported Ruby Interpreters

  • MRI 1.9.x >= 1.9.1
  • MRI 2.0.0
  • MRI 2.1.2
  • JRuby 1.9 mode
  • Rubinius

Survival code example

require "polipus"

Polipus.crawler("rubygems","") do |crawler|
  # In-place page processing
  crawler.on_page_downloaded do |page|
    # A nokogiri object
    puts "Page title: '#{page.doc.css('title').text}' Page url: #{page.url}"


$ gem install polipus


$ bundle install
$ rake

