grubby

Fail-fast web scraping. grubby adds a layer of utility and error-checking atop the marvelous Mechanize gem. See API summary below, or browse the full documentation.

Examples

The following example scrapes stories from the Hacker News front page:

require "grubby"

class HackerNews < Grubby::PageScraper
  scrapes(:items) do
    page.search!(".athing").map{|el| Item.new(el) }
  end

  class Item < Grubby::Scraper
    scrapes(:story_link){ source.at!("a.storylink") }
    scrapes(:story_uri){ story_link.uri }
    scrapes(:title){ story_link.text }
  end
end

# The following line will raise an exception if anything goes wrong
# during the scraping process.  For example, if the structure of the
# HTML does not match expectations, either due to incorrect assumptions
# or a site change, the script will terminate immediately with a helpful
# error message.  This prevents bad data from propagating and causing
# hard-to-trace errors.
hn = HackerNews.scrape("https://news.ycombinator.com/news")

# Your processing logic goes here:
hn.items.take(10).each do |item|
  puts "* #{item.title}"
  puts "  #{item.story_uri}"
  puts
end

Core API

Grubby
Scraper
- .each
- .fields
- .scrape
- .scrapes
- #[]
- #source
- #to_h
PageScraper
- .scrape_file
- #page
JsonScraper
- .scrape_file
- #json
Mechanize::Download
- #save_to
- #save_to!
Mechanize::File
- #save_to
- #save_to!
Mechanize::Page
- #at!
- #search!
Mechanize::Page::Link
- #to_absolute_uri
URI
- #basename
- #query_param

Supplemental API

grubby includes several gems which extend Ruby objects with convenience methods. When you load grubby you automatically make these methods available. The included gems are listed below, along with a few of the methods each provides. See each gem's documentation for a complete API listing.

Installation

Install from Ruby Gems:

$ gem install grubby

Then require in your Ruby script:

require "grubby"

Contributing

Run rake test to run the tests. You can also run rake irb for an interactive prompt that pre-loads the project code.

License

MIT License