WebFetch

Overview

WebFetch is an asynchronous HTTP proxy server that accepts multiple requests for HTTP retrieval, immediately returning a token for each request, and then allowing that token to be redeemed later when the entity has fully responded.

This permits issuing multiple HTTP requests in parallel, in a fully encapsulated and external process, without having to resort to multi-threading, multi-processing, or complex non-blocking IO implementations. EventMachine is used to handle the heavy lifting.

WebFetch architecture

Getting Started

Although WebFetch runs as a web server and provides all functionality over a RESTful API (see below), the simplest way to use it is with its Ruby client implementation, which wraps the HTTP API for you, using Faraday. This also serves as a reference for writing WebFetch clients in other languages.

In your Gemfile, add:

gem 'web_fetch', git: 'https://github.com/bobf/web_fetch.git'

and update your bundle:

bundle install

Create, connect to, and wrap a Ruby client object around a new WebFetch server instance, listening as localhost on port 8077:

require 'web_fetch'
client = WebFetch::Client.create('localhost', 8077)

Issue some requests [asynchronously]:

requests = [{ url: 'http://foobar.baz/' },
            { url: 'http://barfoo.baz/foobar',
              headers: { 'User-Agent' => 'Foo Browser' } },
              query: { foo: 'what is foo', bar: 'what is baz' } ]
jobs = client.gather(requests)

Retrieve the responses [synchronously - any result that has not yet arrived will block until it has arrived while other requests continue to run in parallel]:

responses = []
jobs.each do |job|
  response = client.retrieve_by_uid(job[:uid])
  responses.push(response)
end

See a working example

HTTP API

If you need to use the WebFetch server's HTTP API directly refer to the Swagger API Reference

Managing the WebFetch process yourself

You may want to run the WebFetch server yourself rather than instantiate it via the client. For this case, the executable bin/web_fetch_control is provided.

WebFetch can be started in the terminal with output going to STDOUT or as a daemon.

Run the server as a daemon:

$ bundle exec bin/web_fetch_control start -- --log /tmp/web_fetch.log

Note that you should always pass --log when running as a daemon otherwise all output will go to the null device.

Run the server in the terminal:

$ bundle exec bin/web_fetch_control run -- --port 8080

It is further recommended to use a process management tool to monitor the pidfile (pass --pidfile /path/to/file.pid to specify an explicit location).

To connect to an existing process, use WebFetch::Client.new rather than WebFetch::Client.create. For example:

WebFetch::Client.new('localhost', 8087)

WebFetch Client request options

WebFetch::Client#gather accepts an array of hashes which may contain the following parameters:

  • url: The target URL [string]
  • headers: HTTP headers [hash]
  • query: Query parameters [hash]
  • method: HTTP method (default: "GET") [string]
  • body: HTTP body [string]

These parameters will all be used (where provided) when initiating the HTTP request on the target.

Arbitrary parameters can also be passed and will be returned by #gather (though they will not be used to construct the HTTP request). This allows tagging requests with arbitrary information if you need to identify them in a particular way. For example, you may want to generate your own unique identifier for a request, in which case you could do:

client.gather([{ url: 'http://foobar.baz', my_unique_id: '123-456-789' }])
# [{:request=>{:url=>"http://foobar.baz", :my_unique_id=>"123-456-789"}, :hash=>"7c511911d16e1072363fa1653bdd93df65208901", :uid=>"1fb4ee7a-9fc0-4896-9af2-7cbdf234a468"}]

Logging

WebFetch logs to STDOUT by default. An alternative log file can be set either by passing --log /path/to/logfile to the command line server, or by passing log: '/path/to/logfile' to WebFetch::Client.create:

$ bundle exec bin/web_fetch_server --log /tmp/web_fetch.log
client = WebFetch::Client.create('localhost', 8077, log: '/tmp/web_fetch.log')

Contributing

WebFetch uses rspec for testing:

bin/rspec

Rubocop is used for code style governance:

bin/rubocop

Make sure that any new code you write has an appropriate test and that all Rubocop checks pass.

Feel free to fork and create a pull request if you would like to make any changes.

License

WebFetch is licensed under the MIT License. You are encouraged to re-use the code in any way you see fit as long as you give credit to the original author. If you do use the code for any other projects then feel free to let me know but, of course, this is not required.