Scraping
A really simple HTML scraping DSL.
Installation
Add this line to your application's Gemfile:
gem 'scraping'
And then execute:
$ bundle
Usage
A simple example
class Person
include Scraping
element :name, 'h1'
end
person = Person.scrape('<h1>Millard Fillmore</h1>')
person.name #=> 'Millard Fillmore'
More complex data structures
You can also scrape arrays, objects, and arrays of objects. elements and elements_of can be deeply nested.
class YouCan
include Scraping
elements :scrape, '.scrape'
elements :also_scrape, '.also-scrape li' do
element :name, 'a'
element :link, 'a/@href'
elements :numbers, 'span'
end
elements_of :nested_scrape do
element :data, '.data'
end
end
you_can = YouCan.scrape(" <p class=\"scrape\">\n <span>Arrays</span>\n <span>Too</span>\n </p>\n\n <ul class=\"also-scrape\">\n <li>\n <a href=\"example.com\">Meek Mill</a>\n <span>1</span>\n <span>2</span>\n </li>\n <li><a href=\"test.com\">Drake</a></li>\n <ul>\n\n <p class=\"data\">Beef</p>\n")
you_can.scrape #=> ['Arrays', 'Too']
you_can.also_scrape.first.name #=> 'Meek Mill'
you_can.also_scrape.first.link #=> 'example.com'
you_can.also_scrape.first.numbers #=> ['1', '2']
you_can.nested_scrape.data #=> 'Beef'
Customizing extraction
Any block given to #element will allow you to customize the value extracted from the found node.
Using as: :something would call a method named #extract_something.
class Advanced
element :first_name, '.name' do |node|
node.text.split(', ').first
end
element :birthday, '.birthday', as: :date
private
def extract_date(node)
Date.parse(node.text)
end
end
advanced = Advanced.new(" <h1 class=\"name\">Millard Fillmore</h1>\n <h2 class=\"birthday\">7-1-1800</h2>\n")
advanced.first_name #=> 'Millard'
advanced.birthday #=> #<Date: 1800-01-07>
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/scraping.
License
The gem is available as open source under the terms of the MIT License.