grubby
Fail-fast web scraping. grubby adds a layer of utility and error-checking atop the marvelous Mechanize gem. See API summary below, or browse the full documentation.
Examples
The following example scrapes stories from the Hacker News front page:
require "grubby"
class HackerNews < Grubby::PageScraper
scrapes(:items) do
page.search!(".athing").map{|el| Item.new(el) }
end
class Item < Grubby::Scraper
scrapes(:story_link){ source.at!("a.storylink") }
scrapes(:story_uri){ story_link.uri }
scrapes(:title){ story_link.text }
end
end
# The following line will raise an exception if anything goes wrong
# during the scraping process. For example, if the structure of the
# HTML does not match expectations, either due to incorrect assumptions
# or a site change, the script will terminate immediately with a helpful
# error message. This prevents bad data from propagating and causing
# hard-to-trace errors.
hn = HackerNews.scrape("https://news.ycombinator.com/news")
# Your processing logic goes here:
hn.items.take(10).each do |item|
puts "* #{item.title}"
puts " #{item.story_uri}"
puts
end
Core API
- Grubby
- Scraper
- PageScraper
- JsonScraper
- Mechanize::Download
- Mechanize::File
- Mechanize::Page
- Mechanize::Page::Link
- URI
Supplemental API
grubby includes several gems which extend Ruby objects with convenience methods. When you load grubby you automatically make these methods available. The included gems are listed below, along with a few of the methods each provides. See each gem's documentation for a complete API listing.
- Active Support (docs)
- casual_support (docs)
- gorge (docs)
- mini_sanity (docs)
- pleasant_path (docs)
- ryoba (docs)
Installation
Install from Ruby Gems:
$ gem install grubby
Then require in your Ruby script:
require "grubby"
Contributing
Run rake test
to run the tests. You can also run rake irb
for an
interactive prompt that pre-loads the project code.