Module: Rubydoop

Defined in:
lib/rubydoop.rb,
lib/rubydoop/dsl.rb,
lib/rubydoop/package.rb,
lib/rubydoop/version.rb

Overview

See Rubydoop.configure for the job configuration DSL documentation, Package for the packaging documentation, or the README for a getting started guide.

Defined Under Namespace

Classes: ConfigurationDefinition, Context, JobDefinition, Package

Constant Summary collapse

VERSION =
'1.2.1'

Class Method Summary collapse

Class Method Details

.configure(impl = ConfigurationDefinition) {|*arguments| ... } ⇒ Object

Note:

The tool runner will set the global variable ‘$rubydoop_context` to an object that contains references to the necessary Hadoop configuration. Unless this global variable is set the configuration block is not run (this is a feature, it means that the configuration block doesn’t run in mappers and reducers).

Main entrypoint into the configuration DSL.

Within a configure block you can specify one or more jobs, the ‘job` blocks are run in the context of a JobDefinition instance, so look at that class for documentation about the available properties. The `configure` block is run within the context of a ConfigurationDefinition instance. The arguments to the `configure` block is the command line arguments, minus those handled by Hadoop’s ‘ToolRunner`.

Examples:

Configuring a job


Rubydoop.configure do |*args|
  job 'word_count' do
    input args[0]
    output args[1]

    mapper WordCount::Mapper
    reducer WordCount::Mapper

    output_key Hadoop::Io::Text
    output_value Hadoop::Io::IntWritable
  end
end

Yield Parameters:

  • *arguments (Array<String>)

    The command line arguments



36
37
38
# File 'lib/rubydoop/dsl.rb', line 36

def self.configure(impl=ConfigurationDefinition, &block)
  impl.new($rubydoop_context, &block) if $rubydoop_context
end