Pagedump

Installation

Add this line to your application's Gemfile:

gem 'pagedump'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pagedump

Usage

Create a page driver:

require "pagedump"

class LeMonde < Pagedump::Driver
  URL = "http://www.lemonde.fr/"

  def headlines page
    head 3, page.css(".titre_une a")[0]['href']

    page.css(".titres_hauts article").each do |e|
      head 1, e.css('a')[0]["href"]
    end
  end
end

And scrap its links:

require "pagedump"

healines = @driver.scrap
healines.each do |headline, w|
  puts "%3d\t%-s" % [w, headline]
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/pompadour/pagedump.

License

The gem is available as open source under the terms of the MIT License.