Gush Build Status

proudly made by Chaps

Gush is a parallel workflow runner using only Redis as its message broker and Sidekiq for workers.

Theory

Gush relies on directed acyclic graphs to store dependencies, see Parallelizing Operations With Dependencies by Stephen Toub.

Installation

Add this line to your application's Gemfile:

gem 'gush'

And then execute:

$ bundle

Or install it yourself as:

$ gem install gush

Usage

Defining workflows

The DSL for defining jobs consists of a single run method. Here is a complete example of a workflow you can create:

# workflows/sample_workflow.rb
class SampleWorkflow < Gush::Workflow
  def configure(url_to_fetch_from)
    run FetchJob1, params: { url: url_to_fetch_from }
    run FetchJob2, params: {some_flag: true, url: 'http://url.com'}

    run PersistJob1, after: FetchJob1
    run PersistJob2, after: FetchJob2

    run Normalize,
        after: [PersistJob1, PersistJob2],
        before: Index

    run Index
  end
end

Hint: For debugging purposes you can vizualize the graph using viz command:

bundle exec gush viz SampleWorkflow

For the Workflow above, the graph will look like this:

SampleWorkflow

Passing parameters to jobs

You can pass any primitive arguments into jobs while defining your workflow:

# app/workflows/sample_workflow.rb
class SampleWorkflow < Gush::Workflow
  def configure
    run FetchJob1, params: { url: "http://some.com/url" }
  end
end

See below to learn how to access those params inside your job.

Defining jobs

Jobs are classes inheriting from Gush::Job:

# app/jobs/fetch_job.rb
class FetchJob < Gush::Job
  def work
    # do some fetching from remote APIs

    params #=> {url: "http://some.com/url"}
  end
end

params method is a hash containing your (optional) parameters passed to run method in the workflow.

Passing arguments to workflows

Workflows can accept any primitive arguments in their constructor, which then will be available in your configure method.

Here's an example of a workflow responsible for publishing a book:

# app/workflows/sample_workflow.rb
class PublishBookWorkflow < Gush::Workflow
  def configure(url, isbn)
    run FetchBook, params: { url: url }
    run PublishBook, params: { book_isbn: isbn }
  end
end

and then create your workflow with those arguments:

PublishBookWorkflow.new("http://url.com/book.pdf", "978-0470081204")

Running workflows

Now that we have defined our workflow we can use it:

1. Initialize and save it

flow = SampleWorkflow.new(optional, arguments)
flow.save # saves workflow and its jobs to Redis

or: you can also use a shortcut:

flow = SampleWorkflow.create(optional, arguments)

2. Start workflow

First you need to start Sidekiq workers:

bundle exec gush workers

and then start your workflow:

flow.start!

Now Gush will start processing jobs in background using Sidekiq in the order defined in configure method inside Workflow.

Pipelining

Gush offers a useful feature which lets you pass results of a job to its dependencies, so they can act accordingly.

Example:

Let's assume you have two jobs, DownloadVideo, EncodeVideo. The latter needs to know where the first one downloaded the file to be able to open it.

class DownloadVideo < Gush::Job
  def work
    downloader = VideoDownloader.fetch("http://youtube.com/?v=someytvideo")

    output(downloader.file_path)
  end
end

output method is Gush's way of saying: "I want to pass this down to my descendants".

Now, since DownloadVideo finished and its dependant job EncodeVideo started, we can access that payload down the (pipe)line:

class EncodeVideo < Gush::Job
  def work
    video_path = payloads["DownloadVideo"]
  end
end

payloads is a hash containing outputs from all parent jobs, where job class names are the keys.

Note: payloads will only contain outputs of the job's ancestors. So if job A depends on B and C, the payloads hash will look like this:

{
  "B" => (...),
  "C" => (...)
}

Checking status:

In Ruby:

flow.reload
flow.status
#=> :running|:finished|:failed

reload is needed to see the latest status, since workflows are updated asynchronously.

Via CLI:

  • of a specific workflow:
  bundle exec gush show <workflow_id>
  • of all created workflows:
  bundle exec gush list

Requiring workflows inside your projects

When using Gush and its CLI commands you need a Gushfile.rb in root directory. Gushfile should require all your Workflows and jobs, for example:

require_relative './lib/your_project'

Dir[Rails.root.join("app/workflows/**/*.rb")].each do |file|
  require file
end

Contributors

Contributing

  1. Fork it ( http://github.com/chaps-io/gush/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request