CrowdKit
CrowdKit is the official Ruby wrapper for the CrowdFlower API v2.
CrowdKit is heavily inspired by Peter Murach's fantastic Github API gem: https://github.com/peter-murach/github.
Sample Usage
require "crowdkit"
#A reusable client instance
@client = Crowdkit.new(access_token: "123abc")
@client.jobs(12345).units.list do |unit|
puts unit.state
end
#Global configuration and a new client instance on every call to `Crowdkit`
Crowdkit.configure do |c|
c.access_token = "123abc"
end
puts Crowdkit.units.get(unit_id: 54321)
Contents
Configuration
As demonstrated above, CrowdKit can be instantiated in one of two ways. The Crowdkit
namespace will proxy to a newly created client instance every time a defined method is called. Alternatively calling Crowdkit.new
will return a reusable client instance. Configuration overrides can be passed into Crowdkit.new
as a hash, or both Crowdkit.new
and Crowdkit.configure
accept a block that receives a configuration object as demonstrated below.
Crowdkit.new(debug: true)
Crowdkit.new do |config|
config.debug = true
end
Crowdkit.configure do |config|
config.debug = true
end
Configuration Variables
Variable | Definition |
---|---|
access_token | Your CrowdFlower Access Token found on your account page, required for API access. |
per_page | The number of results to request per page. By default 10. |
debug | Enabling this will log all API activity for debugging purposes. |
adapter | The HTTP adapter to use, by default :net_http, other options are: :net_http_persistent, :typhoeus, :patron, :em_synchrony, :excon, :test |
user_agent | The user agent, by default "CrowdKit Ruby Gem version". |
auto_paginate | Whether or not to automatically paginate through collections, default: false. |
api_endpoint | A custom API endpoint, default: https://api.crowdflower.com/v2 |
ssl | By default the client is configured to use OpenSSL::SSL::VERIFY_PEER, to disable peer verification set this to { verify: false } |
Environment Variables
Lastly, Crowdkit looks for all configuration variables in the environment with the following format: CROWDKIT_{upcased_variable_name}
i.e. CROWDKIT_ACCESS_TOKEN=abc123
Advanced Configuration
Crowdkit uses Faraday and exposes a stack
configuration parameter. stack
can be freely modified with methods such as insert
, insert_after
, delete
and swap
. Additionally, if you're feeling adventurous you can override the default stack completely:
Crowdkit.configure do |config|
config.stack.insert CustomMiddleware
end
Crowdkit.configure do |config|
config.stack do |builder|
builder.use CustomMiddleware
builder.use Crowdkit::Middleware::RaiseError
builder.adapter :excon
end
end
Scopes & Parameters
Crowdkit operates within scopes appropriate to the CrowdFlower API. The most common scope is jobs
. Parameters can be passed into any scope and will be used by the API method. i.e.
client.jobs(state: "finished").list
Ofcourse the API methods themselves also accept parameters
client.jobs.list(state: "finished")
Most scopes accept an optional first parameter that will be translated to the primary key of the scope for convenience. i.e.
client.units(54321).get
client.jobs.search("whatever")
The above is equivalent to the following:
client.units.get(unit_id: 54321)
client.jobs.search(query: "whatever")
Lastly Crowdkit also provides a with
method to clearly denote your scopes.
client.with(unit_id: 54321).units.get
Errors
All errors thrown by the client inherit from Crowdkit::Error
. There are two primary types of errors: Crowdkit::UserError
and Crowdkit::ServiceError
. User errors are thrown when local validations fail while service errors are thrown when we receive invalid response codes from the CrowdFlower servers.
Example Usage
Get statistics for jobs with a given tag
jobs = client.jobs.search("url", fields: ["tags"])
jobs.each do |job|
pp job.stats
end
Copying a job and ordering all units
new_job = client.jobs(101010).copy(all_units: true)
CrowdKit.wait_on_status(new_job)
order = client.jobs(new_job.id).order
CrowdKit.wait_on_status(order)
Sum unit states for all units in a job
client.jobs(101010).units.list(auto_pagination: true).inject(Hash.new(0)) do |memo, unit|
memo[unit.state] += 1
memo
end
Poll a job for completed units
while true
units = client.jobs(101010).units.poll
if units.any?
units.each do |unit|
# Do something with the resulting data
result = unit.aggregate_result
# We also support acknowledging individual units if you want more granularity i.e.
# result.delete
end
# This acknowledges all units in one request but assumes the above code will
# complete in less than 30 seconds if you have multiple pollers.
units.delete
else
sleep 10
end
end
Development
All scopes are defined in the client
directory and inherit from API. The API class provides the following convenience methods for defining API's:
arguments: parses arguments and let's you specify which if any are required: arguments(args, required: [:job_id])
do_http_verb: performs the specified http_verb.
namespace: attaches a scope and uses the class defined with class_name
to instantiate it: namespace :units, class_name: "Client::Units"