RedStorm v0.6.4 - JRuby on Storm

build status

RedStorm provides a Ruby DSL using JRuby integration for the Storm distributed realtime computation system.

Documentation

Chances are new versions of RedStorm will introduce changes that will break compatibility or change the developement workflow. To prevent out-of-sync documentation, per version specific documentation are kept in the wiki when necessary.

Released gems

Dependencies

Tested on OSX 10.8.2 and Linux 12.04 using Storm 0.8.1 and JRuby 1.6.8 and OpenJDK 7

Notes about 1.8/1.9 JRuby compatibility

Up until the upcoming JRuby 1.7, JRuby runs in 1.8 Ruby compatibility mode by default. Unless you have a specific need to run topologies in 1.8 mode, you should use 1.9 mode, which will become the default in JRuby.

There are two ways to have JRuby 1.6.x run in 1.9 mode by default:

  • by setting the JRUBY_OPTS env variable
  $ export JRUBY_OPTS=--1.9
  • by installing JRuby using RVM with 1.9 mode by default
  $ rvm install jruby --1.9

Otherwise, to manually choose the JRuby compatibility mode, this JRuby syntax can be used

$ jruby --1.9 -S redstorm ...

By defaut, a topology will be executed in the same mode as the interpreter running the $ redstorm command. You can force RedStorm to choose a specific JRuby compatibility mode using the [--1.8|--1.9] parameter for the topology execution in local or remote cluster.

$ redstorm local|cluster [--1.8|--1.9] ...

Installation

  • RubyGems
  $ gem install redstorm
  • Bundler
  source :rubygems
  gem "redstorm", "~> 0.6.4"

Usage overview

  • create a project directory.
  • install the RedStorm gem.
  • create a subdirectory for your topology code.
  • perform the initial setup as described below to build and install dependencies.
  • run your topology in local mode and/or on a remote cluster as described below.

Initial setup

$ redstorm install

or if your default JRuby mode is 1.8 but you want to use 1.9 for your topology development, use

$ jruby --1.9 -S redstorm install

This will basically install default Java jar dependencies in target/dependency, generate & compile the Java bindings in target/classes.

Create a topology

Create a subdirectory for your topology code and create your topology class using this naming convention: underscore topology_class_file_name.rb MUST correspond to its CamelCase class name.

Gems in your topology

RedStorm requires Bundler if gems are needed in your topology. Basically supply a Gemfile in the root of your project directory with the gems required in your topology. If you are using Bundler for other gems you should group the topology gems in a Bunder group of your choice.

  1. have Bundler install the gems locally
  $ bundle install

or if your default JRuby mode is 1.8 but you want to use 1.9 for your topology development, use

  $ jruby --1.9 -S bundle install
  1. copy the topology gems into the target/gems directory
  $ redstorm bundle [BUNDLER_GROUP]

Basically, the redstorm bundle command copy the gems specified in the Gemfile (in a specific group if specified) into the target/gems directory. In order for the topology to run in a Storm cluster, the fully installed gems must be packaged and self-contained into a single jar file. Note you should NOT require 'bundler/setup' in the topology.

This has an important consequence: the gems will not be installed on the cluster target machines, they are already installed in the jar file. This could lead to problems if the machine used to install the gems is of a different architecture than the cluster target machines and some of these gems have C or FFI extensions.

Custom Jar dependencies in your topology

By defaut, RedStorm installs Storm and JRuby jars dependencies. If you require custom dependencies, these can be specified by creating the Dependencies file in the root of your project. Note that this file overwrites the defaults dependencies so you must also include the Storm and JRuby dependencies. Here's an example of a Dependencies file which included the jars required to run the KafkaTopology in the examples.

{
  :storm_artifacts => [
    "storm:storm:0.8.1, transitive=true",
  ],
  :topology_artifacts => [
    "org.jruby:jruby-complete:1.6.8, transitive=false",
    "org.scala-lang:scala-library:2.8.0, transitive=false",
    "storm:kafka:0.7.0-incubating, transitive=false",
    "storm:storm-kafka:0.8.0-wip4, transitive=false",
  ],
}

Basically the dependendencies are speified as Maven artifacts. There are two sections, the :storm_artifacts => contains the dependencies for running storm in local mode and the :topology_artifacts => are the dependencies specific for your topology. The format is self explainatory and the attribute transitive=[true|false] controls the recursive dependencies resolution (using true).

The jars repositories can be configured by adding the ivy/setting.xml file in the root of your project. For information on the Ivy settings format, see the Ivy Settings Documentation. I will try my best to eliminate all XML :) but for now I haven't figured how to get rid of this one. For an example Ivy settings file, RedStorm is using the following settings by default:

<ivysettings>
  <settings defaultResolver="repositories"/>
  <resolvers>
    <chain name="repositories">
      <ibiblio name="ibiblio" m2compatible="true"/>
      <ibiblio name="maven2" root="http://repo.maven.apache.org/maven2/" m2compatible="true"/> 
      <ibiblio name="sonatype" root="http://repo.maven.apache.org/maven2/" m2compatible="true"/> 
      <ibiblio name="clojars" root="http://clojars.org/repo/" m2compatible="true"/> 
    </chain>
  </resolvers>
</ivysettings>

Run in local mode

$ redstorm local [--1.8|--1.9]  <path/to/topology_class_file_name.rb>

By defaut, a topology will be executed in the same mode as the interpreter running the $ redstorm command. You can force RedStorm to choose a specific JRuby compatibility mode using the [--1.8|--1.9] parameter for the topology execution in local or remote cluster.

See examples below to run examples in local mode or on a production cluster.

Run on production cluster

  1. download and unpack the Storm 0.8.1 distribution locally and add the Storm bin/ directory to your $PATH.

  2. generate target/cluster-topology.jar. This jar file will include your sources directory plus the required dependencies

  $ redstorm jar <sources_directory1> <sources_directory2> ...
  1. submit the cluster topology jar file to the cluster
  $ redstorm cluster [--1.8|--1.9]  <path/to/topology_class_file_name.rb>

By defaut, a topology will be executed in the same mode as the interpreter running the $ redstorm command. You can force RedStorm to choose a specific JRuby compatibility mode using the [--1.8|--1.9] parameter for the topology execution in local or remote cluster.

The Storm wiki has instructions on setting up a production cluster. You can also manually submit your topology.

Examples

Install the example files in your project. The examples/ dir will be created in your project root dir.

$ redstorm examples

All examples using the simple DSL are located in examples/simple. Examples using the standard Java interface are in examples/native.

Local mode

Example topologies without gems

$ redstorm local examples/simple/exclamation_topology.rb
$ redstorm local examples/simple/exclamation_topology2.rb
$ redstorm local examples/simple/word_count_topology.rb

Example topologies with gems

For examples/simple/redis_word_count_topology.rb the redis gem is required and you need a Redis server running on localhost:6379

  1. create a Gemfile
  source :rubygems

  group :word_count do
    gem "redis"
  end
  1. install the topology gems
  $ bundle install
  $ redstorm bundle word_count
  1. run the topology in local mode
  $ redstorm local examples/simple/redis_word_count_topology.rb

Using redis-cli push words into the test list and watch Storm pick them up

Remote cluster

All examples using the simple DSL can run in both local or on a remote cluster. The only native example compatible with a remote cluster is examples/native/cluster_word_count_topology.rb.

Topologies without gems

  1. genererate the target/cluster-topology.jar and include the examples/ directory.
  $ redstorm jar examples
  1. submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm bin/ directory in your path:
  $ redstorm cluster examples/simple/exclamation_topology.rb
  $ redstorm cluster examples/simple/exclamation_topology2.rb
  $ redstorm cluster examples/simple/word_count_topology.rb

Topologies with gems

For examples/simple/redis_word_count_topology.rb the redis gem is required and you need a Redis server running on localhost:6379

  1. create a Gemfile
  source :rubygems

  group :word_count do
      gem "redis"
  end
  1. install the topology gems
  $ bundle install
  $ redstorm bundle word_count
  1. genererate the target/cluster-topology.jar and include the examples/ directory.
  $ redstorm jar examples
  1. submit the cluster topology jar file to the cluster, assuming you have the Storm distribution installed and the Storm bin/ directory in your path:
  $ redstorm cluster examples/simple/redis_word_count_topology.rb

Using redis-cli push words into the test list and watch Storm pick them up

The Storm wiki has instructions on setting up a production cluster. You can also manually submit your topology.

Ruby DSL

Ruby DSL Documentation

Multilang ShellSpout & ShellBolt support

Please refer to Using non JVM languages with Storm for the complete information on Multilang & shelling in Storm.

In RedStorm ShellSpout and ShellBolt are supported using the following construct in the topology definition:

bolt JRubyShellBolt, ["python", "splitsentence.py"] do
  output_fields "word"
  source SimpleSpout, :shuffle
end
  • JRubyShellBolt must be used for a ShellBolt and the array argument ["python", "splitsentence.py"] are the arguments to the class constructor and are the commands to the ShellBolt.

  • The directory containing the topology class must contain a resources/ directory with all the shell files.

See the shell topology example

RedStorm Development

It is possible to fork the RedStorm project and run local and remote/cluster topologies directly from the project sources without installing the gem. This is a useful setup when contributing to the project.

Requirements

  • JRuby 1.6.8

Workflow

  • fork project and create branch

  • install RedStorm required gems

  $ bundle install
  • install dependencies in target/dependencies
  $ bin/redstorm deps
  • generate and build Java source into target/classes
  $ bin/redstorm build

if you modify any of the RedStorm Ruby code or Java binding code, you need to run this to refresh code and rebuild the bindings

  • follow the normal usage patterns to run the topology in local or remote cluster.

How to Contribute

Fork the project, create a branch and submit a pull request.

Some ways you can contribute:

  • by reporting bugs using the issue tracker
  • by suggesting new features using the issue tracker
  • by writing or editing documentation
  • by writing specs
  • by writing code
  • by refactoring code
  • ...

Projects using RedStorm

If you want to list your RedStorm project here, contact me.

  • Tweigeist - realtime computation of the top trending hashtags on Twitter. See Live Demo.

Author

Colin Surprenant, @colinsurprenant, http://github.com/colinsurprenant/, [email protected], http://colinsurprenant.com/

Contributors

Theo Hultberg, https://github.com/iconara

License

Apache License, Version 2.0. See the LICENSE.md file.