CrowdKit

CrowdKit is the official Ruby wrapper for the CrowdFlower API v2.

CrowdKit is heavily inspired by Peter Murach's fantastic Github API gem: https://github.com/peter-murach/github.

Sample Usage

require "crowdkit"

#A reusable client instance
@client = Crowdkit.new(access_token: "123abc")
@client.jobs(12345).units.list do |unit|
  puts unit.state
end

#Global configuration and a new client instance on every call to `Crowdkit`
Crowdkit.configure do |c|
  c.access_token = "123abc"
end
puts Crowdkit.units.get(unit_id: 54321)

Contents

  1. Configuration
  2. Scopes & Parameters
  3. Errors
  4. Example Usage
  5. Development

Configuration

As demonstrated above, CrowdKit can be instantiated in one of two ways. The Crowdkit namespace will proxy to a newly created client instance every time a defined method is called. Alternatively calling Crowdkit.new will return a reusable client instance. Configuration overrides can be passed into Crowdkit.new as a hash, or both Crowdkit.new and Crowdkit.configure accept a block that receives a configuration object as demonstrated below.

Crowdkit.new(debug: true)
Crowdkit.new do |config|
  config.debug = true
end
Crowdkit.configure do |config|
  config.debug = true
end

Configuration Variables

Variable Definition
access_token Your CrowdFlower Access Token found on your account page, required for API access.
per_page The number of results to request per page. By default 10.
debug Enabling this will log all API activity for debugging purposes.
adapter The HTTP adapter to use, by default :net_http, other options are: :net_http_persistent, :typhoeus, :patron, :em_synchrony, :excon, :test
user_agent The user agent, by default "CrowdKit Ruby Gem version".
auto_paginate Whether or not to automatically paginate through collections, default: false.
api_endpoint A custom API endpoint, default: https://api.crowdflower.com/v2
ssl By default the client is configured to use OpenSSL::SSL::VERIFY_PEER, to disable peer verification set this to { verify: false }

Environment Variables

Lastly, Crowdkit looks for all configuration variables in the environment with the following format: CROWDKIT_{upcased_variable_name} i.e. CROWDKIT_ACCESS_TOKEN=abc123

Advanced Configuration

Crowdkit uses Faraday and exposes a stack configuration parameter. stack can be freely modified with methods such as insert, insert_after, delete and swap. Additionally, if you're feeling adventurous you can override the default stack completely:

Crowdkit.configure do |config|
  config.stack.insert CustomMiddleware
end

Crowdkit.configure do |config|
  config.stack do |builder|
    builder.use CustomMiddleware
    builder.use Crowdkit::Middleware::RaiseError
    builder.adapter :excon
  end
end

Scopes & Parameters

Crowdkit operates within scopes appropriate to the CrowdFlower API. The most common scope is jobs. Parameters can be passed into any scope and will be used by the API method. i.e.

client.jobs(state: "finished").list

Ofcourse the API methods themselves also accept parameters

client.jobs.list(state: "finished")

Most scopes accept an optional first parameter that will be translated to the primary key of the scope for convenience. i.e.

client.units(54321).get
client.jobs.search("whatever")

The above is equivalent to the following:

client.units.get(unit_id: 54321)
client.jobs.search(query: "whatever")

Lastly Crowdkit also provides a with method to clearly denote your scopes.

client.with(unit_id: 54321).units.get

Errors

All errors thrown by the client inherit from Crowdkit::Error. There are two primary types of errors: Crowdkit::UserError and Crowdkit::ServiceError. User errors are thrown when local validations fail while service errors are thrown when we receive invalid response codes from the CrowdFlower servers.

Example Usage

Get statistics for jobs with a given tag

jobs = client.jobs.search("url", fields: ["tags"])
jobs.each do |job|
  pp job.stats
end

Copying a job and ordering all units

new_job = client.jobs(101010).copy(all_units: true)
CrowdKit.wait_on_status(new_job)
order = client.jobs(new_job.id).order
CrowdKit.wait_on_status(order)

Sum unit states for all units in a job

client.jobs(101010).units.list(auto_pagination: true).inject(Hash.new(0)) do |memo, unit|
  memo[unit.state] += 1
  memo
end

Poll a job for completed units

while true
  units = client.jobs(101010).units.poll
  if units.any?
    units.each do |unit|
      # Do something with the resulting data
      result = unit.aggregate_result
      # We also support acknowledging individual units if you want more granularity i.e.
      # result.delete
    end
    # This acknowledges all units in one request but assumes the above code will
    # complete in less than 30 seconds if you have multiple pollers.
    units.delete
  else
    sleep 10
  end
end

Development

All scopes are defined in the client directory and inherit from API. The API class provides the following convenience methods for defining API's:

arguments: parses arguments and let's you specify which if any are required: arguments(args, required: [:job_id])

do_http_verb: performs the specified http_verb.

namespace: attaches a scope and uses the class defined with class_name to instantiate it: namespace :units, class_name: "Client::Units"