Nibbler is a cute HTML screen-scraping tool.
require 'nibbler' require 'open-uri' class < element :title elements 'div.hentry' => :articles do element 'h2' => :title element 'a/@href' => :url end end blog = . open('http://example.com') blog.title #=> "My blog title" blog.articles.first.title #=> "First article title" blog.articles.first.url #=> "http://example.com/article"
There are sample scripts in the "examples/" directory; run them with:
ruby -Ilib -rubygems examples/delicious.rb ruby -Ilib -rubygems examples/tweetburner.rb > output.csv
See the wiki for more on how to use Nibbler.
None. Well, Nokogiri is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scraper with an Hpricot document or anything else that implements