Spn2
Spn2 is a gem for interacting with the Wayback Machine's Save Page Now 2 (SPN2) REST API. The API (draft) specification is here.
Installation
Install the gem and add to the application's Gemfile by executing:
$ bundle add spn2
If bundler is not being used to manage dependencies, install the gem by executing:
$ gem install spn2
Usage
For the Spn2 namespace do:
require 'spn2'
Authentication
The API requires authentication, so you will need an account at archive.org. There are two methods of authentication; cookies and API key. Presently only the latter is implemented. API keys may be generated at https://archive.org/account/s3.php. Ensure your access key and secret key are set in environment variables SPN2_ACCESS_KEY and SPN2_SECRET_KEY respectively.
> Spn2.access_key
=> <your access key>
> Spn2.secret_key
=> <your secret key>
Save a page
Save (capture) a url in the Wayback Machine. This method returns the job_id in a hash.
> Spn2.save(url: 'example.com') # returns a job_id
=> {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"} # may include a "message" key too
Various options are available, as detailed in the specification in the section "Capture request". These may be passed like so:
> Spn2.save(url: 'example.com', opts: { capture_all: 1, capture_outlinks: 1 })
=> {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"}
Page save errors will raise an error and look like this:
=> {"status"=>"error", "status_ext"=>"error:too-many-daily-captures", "message"=>"This URL has been already captured 10 times today.
Please try again tomorrow. Please email us at \"[email protected]\" if you would like to discuss this more."} (Spn2::Spn2ErrorFailedCapture)
The key "status_ext" contains an explanatory message - see the API specification.
View the status of a job
Use the job_id.
> Spn2.status(job_ids: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14')
=> {"counters"=>{"outlinks"=>1, "embeds"=>2}, "job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14",
"original_url"=>"http://example.com/", "resources"=>["http://example.com/", "http://example.com/favicon.ico"],
"duration_sec"=>6.732, "outlinks"=>["https://www.iana.org/domains/example"], "http_status"=>200,
"timestamp"=>"20220622224107", "status"=>"success"}
"status" => "success" is what you are looking for.
Care is advised for domains/urls which are frequently saved into the Wayback Machine as the job_id is merely "spn2-" followed by a hash of the url*. A status request will show the status of the most recent capture by anyone of the url in question.
* Usually an sha1 hash of the url in the form http://<domain>/<path>/ e.g:
$ echo "http://example.com/"|tr -d "\n"|shasum
9c17e047f58f9220a7008d4f18152fee4d111d14 -
The status of an array of job_id's can be obtained with:
> Spn2.status(job_ids: ['spn2-9c17e047f58f9220a7008d4f18152fee4d111d14', 'spn2-...'])
=> [.. # an array of status hashes
Finally, the status of any outlinks captured by using the save option capture_outlinks: 1
is available by supplying the parent job_id to:
> Spn2.status(job_ids: 'spn2-cce034d987e1d72d8cbf1770bcf99024fe20dddf', outlinks: true)
=> [.. # an array of outlink job status hashes
User status
Information about the user is available via:
> Spn2.user_status
=> {"daily_captures_limit"=>100000, "available"=>8, "processing"=>0, "daily_captures"=>10}
System status
The status of Wayback Machine itself is available.
> Spn2.system_status
=> {"status"=>"ok"} # if not "ok" captures may be delayed
Error handling
To facilitate graceful error handling, a full list of all error classes is provided by:
> Spn2.error_classes
=> [Spn2::Spn2Error, Spn2::Spn2ErrorBadAuth,.. ..]
Testing
Just run bundle exec rake
to run the test suite.
Valid API keys must be held in SPN2_ACCESS_KEY and SPN2_SECRET_KEY for testing. Go to https://archive.org/account/s3.php to set up API keys if you need them. If you have your live keys stored in these env vars just do:
export SPN2_ACCESS_KEY=<valid access test key> && export SPN2_SECRET_KEY=<valid secret test key>
immediately before the above command.
Development
~~After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.~~
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitLab at https://gitlab.com/matzfan/spn2. Please run rubocop
and correct all errors before submitting PR's.
License
The gem is available as open source under the terms of the MIT License.