ScrapedPageArchive Build Status Gem Version

Add this gem to your Ruby scraper and it will automatically capture http requests and cache the response in a branch within your git repository.

Installation

Add this line to your application's Gemfile:

gem 'scraped_page_archive'

And then execute:

$ bundle

Or install it yourself as:

$ gem install scraped_page_archive

Usage

First require the library:

require 'scraped_page_archive'

Then configure the github url to clone. This will need to have a GitHub token embedded in it, you can generate a new one here. It will need to have the repo permission checked.

If you're using the excellent morph.io then you can set the MORPH_SCRAPER_CACHE_GITHUB_REPO_URL environment variable to your git url:

Name Value
MORPH_SCRAPER_CACHE_GITHUB_REPO_URL https://[email protected]/tmtmtmtm/estonia-riigikogu

You can also set this to any value (including another environment variable of your choosing) with the following:

ScrapedPageArchive.github_repo_url = 'https://[email protected]/tmtmtmtm/estonia-riigikogu'

Then you can record http requests by performing them in a block passed to ScrapedPageArchive.record:

ScrapedPageArchive.record do
  response = open('http://example.com/')
  # Use the response...
end

Use with open-uri

If you would like to have your http requests automatically recorded when using open-uri do the following:

require 'scraped_page_archive/open-uri'
response = open('http://example.com/')
# Use the response...

Use with the Capybara Poltergeist driver

If you would like to have your http requests automatically recorded when using the Poltergeist driver in Capybara do the following:

require 'scraped_page_archive/capybara'
visit('http://example.com/')
# Use the response...

It should be possible to adapt this to work with other Capybara drivers fairly easily.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Note that this does not install Capybara or any drivers so if you want to work on that you will need to do that.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/everypolitician/scraped_page_archive.

License

The gem is available as open source under the terms of the MIT License.