WebFetch

Overview

WebFetch executes concurrent, asynchronous HTTP requests. It is itself an HTTP server implementing a RESTful API, wrapped by a Ruby client interface. Instead of returning a response, WebFetch immediately returns a promise which can be redeemed later when the response has been processed.

This permits issuing multiple HTTP requests in parallel, in a fully encapsulated and external process, without having to resort to multi-threading, multi-processing, or complex non-blocking IO implementations. EventMachine is used to handle the heavy lifting.

WebFetch architecture

Getting Started

In your Gemfile, add:

gem 'web_fetch'

and update your bundle:

bundle install

Require WebFetch in your application:

require 'web_fetch'

Launch or connect to a server

Launch the server from your application (recommended for familiarising yourself with WebFetch):

client = WebFetch::Client.create('localhost', 8077)

Or connect to an existing WebFetch server (recommended for production systems - see below for more details):

client = WebFetch::Client.new('localhost', 8077)

Create a request

Create a WebFetch request. Note that the request will not begin until the next step:

request = WebFetch::Request.new do |req|
  req.url = 'http://foobar.baz'
  req.headers = { 'User-Agent' => 'Foo Browser' }
  req.query = { foobar: 'baz' }
  req.method = :get
  req.body = 'foo bar baz'
  req.custom = { my_id: '123' }
end

Only url is required. The default HTTP method is GET.

Anything assigned to custom will be returned with the final result (available by calling #custom on the result). This may be useful if you need to tag each request with your own custom identifier, for example. Anything you assign here will have no bearing whatsoever on the HTTP request.

If you prefer to build a request from a hash, you can call WebFetch::Request.from_hash

request = WebFetch::Request.from_hash(
  url: 'http://foobar.baz',
  headers: { 'User-Agent' => 'Foo Browser' },
  query: { foobar: 'baz' },
  method: :get,
  body: 'foo bar baz',
  custom: { my_id: '123' }
)

Gather responses

Ask WebFetch to begin gathering your HTTP requests in the background:

promises = client.gather([request])

WebFetch::Client#gather accepts an array of WebFetch::Request objects and immediately returns an array of WebFetch::Promise objects. WebFetch will process all requests in the background concurrently.

To retrieve the result of a request, call WebFetch::Promise#fetch

result = promises.first.fetch

# Available methods:
result.body
result.headers
result.status # HTTP status code
result.success? # False if a network error (not HTTP error) occurred
result.error # Underlying network error if applicable

Note that WebFech::Promise#fetch will block until the result is complete by default. If you want to continue executing other code if the result is not ready (e.g. to see if any other results are ready), you can pass wait: false

result = promises.first.fetch(wait: false)

A special value :pending will be returned if the result is still processing.

Alternatively, you can call WebFetch::Promise#complete? to check if a request has finished before waiting for the response:

result = promises.first.fetch if promises.first.complete?

Fetching results later

In some cases you may need to fetch the result of a request in a different context to which you initiated it. A unique ID is available for each Promise which can be used to fetch the result from a separate Client instance:

client = WebFetch::Client.new('localhost', 8077)
promises = client.gather([
  WebFetch::Request.new { |req| req.url = 'http://foobar.com' }
])
uid = promises.first.uid

# Later ...
client = WebFetch::Client.new('localhost', 8077)
result = client.fetch(uid)

This can be useful if your web application initiates requests in one controller action and fetches them in another; the uid can be stored in a database and used to fetch the request later on.

Stopping the server

When you have finished using the web server, call WebFetch::Client#stop

client.stop

The server will not automatically stop when your program exits.

Examples

Runnable examples are provided for more detailed usage.

HTTP API

If you need to use the WebFetch server's HTTP API directly refer to the Swagger API Reference

Managing the WebFetch process yourself

For production systems it is advised that you run the WebFetch server separately rather than instantiate it via the client. For this case, the executable bin/web_fetch_control is provided. Daemonisation is handled by the daemons gem.

WebFetch can be started in the terminal with output going to STDOUT or as a daemon.

Run the server as a daemon:

$ web_fetch_control start

Run the server in the terminal:

$ web_fetch_control run

Stop the server:

$ web_fetch_control stop

To pass options to WebFetch, pass -- to web_fetch_control and add all WebFetch options afterwards.

Available options:

--port 60087
--host localhost
--pidfile /tmp/web_fetch.pid
--log /var/log/web_fetch.log

e.g.:

web_fetch_control run -- --port 8000 --host 0.0.0.0

No pid file will be created unless the --pidfile parameter is passed. It is recommended to use a process monitoring tool (e.g. monit or systemd) to monitor the WebFetch process.

When running as a daemon, WebFetch will log to the null device so it is advised to always pass --log in this case.

Docker

To use WebFetch in Docker you can either use the provided Dockerfile or the public image web_fetch/web_fetch

Contributing

WebFetch uses rspec for testing:

bin/rspec

Rubocop is used for code style governance:

bin/rubocop

Make sure that any new code you write has an appropriate test and that all Rubocop checks pass.

Feel free to fork and create a pull request if you would like to make any changes.

License

WebFetch is licensed under the MIT License. You are encouraged to re-use the code in any way you see fit as long as you give credit to the original author. If you do use the code for any other projects then feel free to let me know but, of course, this is not required.