Class: Elasticrawl::JobStep

Inherits:
ActiveRecord::Base
  • Object
show all
Defined in:
lib/elasticrawl/job_step.rb

Overview

Represents an Elastic MapReduce job flow step. For a parse job this will process a single Common Crawl segment. For a combine job a single step will aggregate the results of multiple parse jobs.

Instance Method Summary collapse

Instance Method Details

#job_flow_step(job_config) ⇒ Object

Returns a custom jar step that is configured with the jar location, class name and input and output paths.

For parse jobs optionally specifies the maximum # of Common Crawl data files to process before the job exits.



14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# File 'lib/elasticrawl/job_step.rb', line 14

def job_flow_step(job_config)
  jar = job_config['jar']
  max_files = self.job.max_files

  step_args = []
  step_args[0] = job_config['class']
  step_args[1] = self.input_paths
  step_args[2] = self.output_path
  # All arguments must be strings.
  step_args[3] = max_files.to_s if max_files.present?

  step = Elasticity::CustomJarStep.new(jar)
  step.name = set_step_name
  step.arguments = step_args

  step
end