WebFetch
Overview
WebFetch is an asynchronous HTTP proxy server that accepts multiple requests for HTTP retrieval, immediately returning a token for each request, and then allowing that token to be redeemed later when the entity has fully responded.
This permits issuing multiple HTTP requests in parallel, in a fully encapsulated and external process, without having to resort to multi-threading, multi-processing, or complex non-blocking IO implementations. EventMachine is used to handle the heavy lifting.
Getting Started
Although WebFetch runs as a web server and provides all functionality over a RESTful API (see below), the simplest way to use it is with its Ruby client implementation, which wraps the HTTP API for you, using Faraday. This also serves as a reference for writing WebFetch clients in other languages.
In your Gemfile
, add:
gem 'web_fetch', git: 'https://github.com/bobf/web_fetch.git'
and update your bundle:
bundle install
Create, connect to, and wrap a Ruby client object around a new WebFetch server instance, listening as localhost
on port 8077
:
require 'web_fetch'
client = WebFetch::Client.create('localhost', 8077)
Issue some requests [asynchronously]:
requests = [{ url: 'http://foobar.baz/' },
{ url: 'http://barfoo.baz/foobar',
headers: { 'User-Agent' => 'Foo Browser' } },
query: { foo: 'what is foo', bar: 'what is baz' } ]
jobs = client.gather(requests)
Retrieve the responses [synchronously - any result that has not yet arrived will block until it has arrived while other requests continue to run in parallel]:
responses = []
jobs.each do |job|
response = client.retrieve_by_uid(job[:uid])
responses.push(response)
end
HTTP API
If you need to use the WebFetch server's HTTP API directly refer to the Swagger API Reference
Managing the WebFetch process yourself
You may want to run the WebFetch server yourself rather than instantiate it via the client. For this case, the executable bin/web_fetch_control
is provided.
WebFetch can be started in the terminal with output going to STDOUT or as a daemon.
Run the server as a daemon:
$ bundle exec bin/web_fetch_control start -- --log /tmp/web_fetch.log
Note that you should always pass --log
when running as a daemon otherwise all output will go to the null device.
Run the server in the terminal:
$ bundle exec bin/web_fetch_control run -- --port 8080
It is further recommended to use a process management tool to monitor the pidfile (pass --pidfile /path/to/file.pid
to specify an explicit location).
To connect to an existing process, use WebFetch::Client.new
rather than WebFetch::Client.create
. For example:
WebFetch::Client.new('localhost', 8087)
WebFetch Client request options
WebFetch::Client#gather
accepts an array of hashes which may contain the following parameters:
url
: The target URL [string]headers
: HTTP headers [hash]query
: Query parameters [hash]method
: HTTP method (default:"GET"
) [string]body
: HTTP body [string]
These parameters will all be used (where provided) when initiating the HTTP request on the target.
Arbitrary parameters can also be passed and will be returned by #gather
(though they will not be used to construct the HTTP request). This allows tagging requests with arbitrary information if you need to identify them in a particular way. For example, you may want to generate your own unique identifier for a request, in which case you could do:
client.gather([{ url: 'http://foobar.baz', my_unique_id: '123-456-789' }])
# [{:request=>{:url=>"http://foobar.baz", :my_unique_id=>"123-456-789"}, :hash=>"7c511911d16e1072363fa1653bdd93df65208901", :uid=>"1fb4ee7a-9fc0-4896-9af2-7cbdf234a468"}]
Logging
WebFetch logs to STDOUT by default. An alternative log file can be set either
by passing --log /path/to/logfile
to the command line server, or by passing
log: '/path/to/logfile'
to WebFetch::Client.create
:
$ bundle exec bin/web_fetch_server --log /tmp/web_fetch.log
client = WebFetch::Client.create('localhost', 8077, log: '/tmp/web_fetch.log')
Contributing
WebFetch uses rspec
for testing:
bin/rspec
Rubocop is used for code style governance:
bin/rubocop
Make sure that any new code you write has an appropriate test and that all Rubocop checks pass.
Feel free to fork and create a pull request if you would like to make any changes.
License
WebFetch is licensed under the MIT License. You are encouraged to re-use the code in any way you see fit as long as you give credit to the original author. If you do use the code for any other projects then feel free to let me know but, of course, this is not required.