job_boss

Credits

The idea for this gem came from trying to use working/starling and not having much success with stability. I created job_boss after some discussions with my colleagues (Neil Cook and Justin Hahn). Thanks also to my employer, RBM Technologies for letting me take some of this work open-source.

Purpose

job_boss allows you to have a daemon much in the same way as workling which allows you to process a series of jobs asynchronously. job_boss, however, uses ActiveRecord to store queued job requests in a database thus simplifying dependencies. This allows for us to process chunks of work in parallel, across multiple servers, without needing much setup

Overview

  • job_boss uses ActiveRecord to store/poll it's queue. It's not dependent on Rails, but if it sees that it's being run in a Rails environment, it will automatically load the environment.rb file
  • Loading up the environment.rb file isn't a big deal because job_boss's model has a main "boss" process which is a deamon. The boss forks employees as needed to execute jobs.
  • Employees only exist for the span of one job, so there's less concern about a build up in memory from leaks (not that we shouldn't be addressing leaks...)
  • The boss is independent and always polling, so it can look for jobs which have been marked an cancelled and kill the employee during processing

Usage

Add the gem to your Gemfile

gem 'job_boss'

or install it

gem install job_boss

Create a directory to store classes which define code which can be executed by the job boss (the default directory is 'app/jobs') and create class files such as this:

# app/jobs/math_jobs.rb
class MathJobs
    def is_prime?(i)
        ('1' * i) !~ /^1?$|^(11+?)\1+$/
    end
end

If you're using Rails, much of the logic that you'll want to queue may already be in models or other application classes. You can queue class methods rather that needing to wrap them in a Job class:

# app/models/article.rb
class Article < ActiveRecord::Base
    class << self
        def refresh_cache(article_ids)
            # code to refresh article cache
        end
    end
end

Start up your boss:

job_boss start -- <options>

You can get command line options with the command:

job_boss run -- -h

But since you don't want to do that right now, it looks something like this:

Usage: job_boss [start|stop|restart|run|zap] [-- <options>]
    -r, --application-root PATH      Path for the application root upon which other paths depend (defaults to .)  Environment variable: JB_APPLICATION_ROOT
    -d, --database-yaml PATH         Path for database YAML (defaults to <application-root>/config/database.yml) Environment variable: JB_DATABASE_YAML_PATH
    -l, --log-path PATH              Path for log file (defaults to <application-root>/log/job_boss.log) Environment variable: JB_LOG_PATH
    -j, --jobs-path PATH             Path to folder with job classes (defaults to <application-root>/app/jobs) Environment variable: JB_JOBS_PATH
    -e, --environment ENV            Environment to use in database YAML file (defaults to 'development') Environment variable: JB_ENVIRONMENT
    -s, --sleep-interval INTERVAL    Number of seconds for the boss to sleep between checks of the queue (default 0.5) Environment variable: JB_SLEEP_INTERVAL
    -c, --employee-limit LIMIT       Maximum number of employees (default 4) Environment variable: JB_EMPLOYEE_LIMIT

From your Rails code or in a console:

require 'job_boss'
batch = Batch.new
jobs = (0..1000).collect do |i|
    batch.queue.math.is_prime?(i)
end

Or:

jobs = []
batch = Batch.new
Article.select('id').find_in_batches(:batch_size => 10) do |articles|
    jobs << batch.queue.article.refresh_cache(articles.collect(&:id))
end

job_boss also makes it easy to wait for the jobs to be done and to collect the results into a hash:

batch.wait_for_jobs # Will sleep until the jobs are all complete

batch.result_hash # => {[0]=>false, [1]=>false, [2]=>true, [3]=>true, [4]=>false, ... }

You can even define a block to provide updates on progress (the value which is passed into the block is a float between 0.0 and 1.0):

batch.wait_for_jobs do |progress|
    puts "We're now at #{progress * 100}%"
end

Prioritization of jobs is also supported. If a particular batch is more important than others, you can specify a higher priority

batch = Batch.new(:priority => 3)

In practical terms, the priority represents the number of jobs which are pulled from the queue to be processed each cycle, so by wary of increasing your priority beyond your maximum number of employees. No job queue will suffer from resource starvation, but you can greatly decrease the performance of other queues by over-prioritizing one.

Also note that job_boss uses a prioritized round-robin approach to scheduling jobs, the priority for jobs is increased throughout the run of the job queue, providing an approximation of a first-come first-serve approach to reduce latency.

For performance, it is recommended that you keep your jobs table clean scheduling execution of the delete_jobs_before command on the Job model, which will clean all jobs completed before the specified time:

Job.delete_jobs_before(2.days.ago)

Features:

  • Call the cancel method on a job to have the job boss cancel it
  • Call the mark_for_redo method on a job to have it processed again. This is automatically run for all currently running jobs in the event that the boss has been told to stop
  • If a job throws an exception, it will be caught and recorded. Call the error method on a job to find out what the error was
  • Find out how long the job took by calling the time_taken method on a job
  • The job boss dispatches "employees" to work on jobs. Viewing the processes, the process name is changed to reflect which jobs employees are working on for easy tracing (e.g. [job_boss employee] job #4 math#is_prime?(4))