Ruby Style Guide Gem Version

Spn2

Spn2 is a gem for interacting with the Wayback Machine's Save Page Now 2 (SPN2) REST API. The API (draft) specification is here.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add spn2

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install spn2

Usage

For the Spn2 namespace do:

require 'spn2'

Authentication

The API requires authentication, so you will need an account at archive.org. There are two methods of authentication; cookies and API key. Presently only the latter is implemented. API keys may be generated at https://archive.org/account/s3.php. Ensure your access key and secret key are set in environment variables SPN2_ACCESS_KEY and SPN2_SECRET_KEY respectively.

> Spn2.access_key
=> <your access key>
> Spn2.secret_key
=> <your secret key>

Save a page

Save (capture) a url in the Wayback Machine. This method returns the job_id in a hash.

> Spn2.save(url: 'example.com') # returns a job_id

=> {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"} # may include a "message" key too

Various options are available, as detailed in the specification in the section "Capture request". These may be passed like so:

> Spn2.save(url: 'example.com', opts: { capture_all: 1, capture_outlinks: 1 })

=> {"url"=>"http://example.com","job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14"}

Page save errors will raise an error and look like this:

=> {"status"=>"error", "status_ext"=>"error:too-many-daily-captures", "message"=>"This URL has been already captured 10 times today. 
    Please try again tomorrow. Please email us at \"[email protected]\" if you would like to discuss this more."} (Spn2::Spn2ErrorFailedCapture)

The key "status_ext" contains an explanatory message - see the API specification.

View the status of a job

Use the job_id.

> Spn2.status(job_ids: 'spn2-9c17e047f58f9220a7008d4f18152fee4d111d14')

=> {"counters"=>{"outlinks"=>1, "embeds"=>2}, "job_id"=>"spn2-9c17e047f58f9220a7008d4f18152fee4d111d14",
    "original_url"=>"http://example.com/", "resources"=>["http://example.com/", "http://example.com/favicon.ico"],
    "duration_sec"=>6.732, "outlinks"=>["https://www.iana.org/domains/example"], "http_status"=>200,
    "timestamp"=>"20220622224107", "status"=>"success"}

"status" => "success" is what you are looking for.

Care is advised for domains/urls which are frequently saved into the Wayback Machine as the job_id is merely "spn2-" followed by a hash of the url*. A status request will show the status of the most recent capture by anyone of the url in question.

* Usually an sha1 hash of the url in the form http://<domain>/<path>/ e.g:

$ echo "http://example.com/"|tr -d "\n"|shasum
9c17e047f58f9220a7008d4f18152fee4d111d14  -

The status of an array of job_id's can be obtained with:

> Spn2.status(job_ids: ['spn2-9c17e047f58f9220a7008d4f18152fee4d111d14', 'spn2-...'])

=> [.. # an array of status hashes

Finally, the status of any outlinks captured by using the save option capture_outlinks: 1 is available by supplying the parent job_id to:

> Spn2.status(job_ids: 'spn2-cce034d987e1d72d8cbf1770bcf99024fe20dddf', outlinks: true)

=> [.. # an array of outlink job status hashes

User status

Information about the user is available via:

> Spn2.user_status
=> {"daily_captures_limit"=>100000, "available"=>8, "processing"=>0, "daily_captures"=>10}

System status

The status of Wayback Machine itself is available.

> Spn2.system_status
=> {"status"=>"ok"} # if not "ok" captures may be delayed

Error handling

To facilitate graceful error handling, a full list of all error classes is provided by:

> Spn2.error_classes
=> [Spn2::Spn2Error, Spn2::Spn2ErrorBadAuth,.. ..]

Testing

Just run bundle exec rake to run the test suite.

Valid API keys must be held in SPN2_ACCESS_KEY and SPN2_SECRET_KEY for testing. Go to https://archive.org/account/s3.php to set up API keys if you need them. If you have your live keys stored in these env vars just do:

export SPN2_ACCESS_KEY=<valid access test key> && export SPN2_SECRET_KEY=<valid secret test key> immediately before the above command.

Development

~~After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.~~

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitLab at https://gitlab.com/matzfan/spn2. Please run rubocop and correct all errors before submitting PR's.

License

The gem is available as open source under the terms of the MIT License.