Module: Wukong::Hadoop::MapLogic

Included in:
HadoopRunner
Defined in:
lib/wukong-hadoop/runner/map_logic.rb

Overview

Implements logic for figuring out the correct mapper commandline given wu-hadoop's arguments.

Instance Method Summary collapse

Instance Method Details

#explicit_map_command?true, false

Were we given an explicit map command (like 'cut -f 1') or are we to introspect and construct the command?

Returns:

  • (true, false)


28
29
30
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 28

def explicit_map_command?
  settings[:map_command]
end

#explicit_map_processor?true, false

Were we given a processor to use as our mapper explicitly by name or are we to introspect to discover the correct processor?

Returns:

  • (true, false)


37
38
39
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 37

def explicit_map_processor?
  settings[:mapper]
end

#explicit_mapper?true, false

Were we given an explicit mapper (either as a command or as a processor) or should we introspect to find one?

Returns:

  • (true, false)


45
46
47
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 45

def explicit_mapper?
  explicit_map_processor? || explicit_map_command?
end

#mapper_argString

The argument that we should introspect on to turn into our mapper.

Returns:

  • (String)


53
54
55
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 53

def mapper_arg
  args.first
end

#mapper_commandlineString

Return the actual commandline used by the mapper, whether running in local or Hadoop mode.

You should be able to copy, paste, and run this command unmodified to debug the mapper.

Returns:

  • (String)


15
16
17
18
19
20
21
22
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 15

def mapper_commandline
  return settings[:map_command] if explicit_map_command?
  arg = (mode == :hadoop ? File.basename(mapper_arg) : mapper_arg)
  [command_prefix, 'wu-local',  arg].tap do |cmd|
    cmd << "--run=#{mapper_name}" if mapper_needs_run_arg?
    cmd << non_wukong_hadoop_params_string
  end.compact.map(&:to_s).reject(&:empty?).join(' ')
end

#mapper_nameString

Return the name of the processor to use as the mapper.

Will raise a Wukong::Error if a given mapper is invalid or if none can be guessed.

Most of the logic that examines explicit command line arguments and checks for the existence of named processors or files is here.

Returns:

  • (String)


80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 80

def mapper_name
  case
  when explicit_mapper?
    if processor_registered?(settings[:mapper])
      settings[:mapper]
    else
      raise Error.new("No such processor: '#{settings[:mapper]}'")
    end
  when map_only? && processor_registered?(mapper_arg)
    mapper_arg
  when map_only? && file_is_processor?(mapper_arg)
    processor_name_from_file(mapper_arg)
  when single_job_arg? && explicit_reducer? && processor_registered?(mapper_arg)
    mapper_arg
  when separate_map_and_reduce_args? && processor_registered?(mapper_arg)
    mapper_arg
  when separate_map_and_reduce_args? && file_is_processor?(mapper_arg)
    processor_name_from_file(mapper_arg)
  when processor_registered?('mapper')
    'mapper'
  else
    raise Error.new("Could not find a processor to use as a mapper")
  end
end

#mapper_needs_run_arg?true, false

Does the mapper commandline need an explicit --run argument?

Will not be used if the processor name is the same as the name of the script.

Returns:

  • (true, false)


63
64
65
66
67
68
# File 'lib/wukong-hadoop/runner/map_logic.rb', line 63

def mapper_needs_run_arg?
  return false if settings[:map_command]
  return false if mapper_arg.to_s == mapper_name.to_s
  return false if File.basename(mapper_arg.to_s, '.rb') == mapper_name.to_s
  true
end