Module: Wukong

Defined in:
lib/wukong.rb,
lib/wukong/store.rb,
lib/wukong/logger.rb,
lib/wukong/schema.rb,
lib/wukong/script.rb,
lib/wukong/encoding.rb,
lib/wukong/streamer.rb,
lib/wukong/datatypes.rb,
lib/wukong/decorator.rb,
lib/wukong/store/base.rb,
lib/wukong/streamer/base.rb,
lib/wukong/datatypes/enum.rb,
lib/wukong/store/cassandra.rb,
lib/wukong/streamer/filter.rb,
lib/wukong/filename_pattern.rb,
lib/wukong/streamer/reducer.rb,
lib/wukong/script/emr_command.rb,
lib/wukong/datatypes/fake_types.rb,
lib/wukong/extensions/hash_like.rb,
lib/wukong/script/local_command.rb,
lib/wukong/streamer/set_reducer.rb,
lib/wukong/script/hadoop_command.rb,
lib/wukong/store/cassandra_model.rb,
lib/wukong/store/flat_file_store.rb,
lib/wukong/streamer/list_reducer.rb,
lib/wukong/streamer/line_streamer.rb,
lib/wukong/streamer/record_streamer.rb,
lib/wukong/streamer/struct_streamer.rb,
lib/wukong/streamer/summing_reducer.rb,
lib/wukong/extensions/hashlike_class.rb,
lib/wukong/streamer/counting_reducer.rb,
lib/wukong/store/chunked_flat_file_store.rb,
lib/wukong/streamer/accumulating_reducer.rb,
lib/wukong/streamer/rank_and_bin_reducer.rb,
lib/wukong/streamer/uniq_by_last_reducer.rb,
lib/wukong/script/cassandra_loader_script.rb,
lib/wukong/store/chh_chunked_flat_file_store.rb

Defined Under Namespace

Modules: Datatypes, EmrCommand, HadoopCommand, HashLike, HashlikeClass, LocalCommand, Schema, Store, Streamer Classes: CassandraScript, Decorator, FilenamePattern, Script

Constant Summary collapse

RESOURCE_CLASS_MAP =
{ }

Class Method Summary collapse

Class Method Details

.class_from_resource(rsrc) ⇒ Object

Find the class from its underscored name. Note the klass is non-modularized. You can also pre-seed RESOURCE_CLASS_MAP



8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# File 'lib/wukong/datatypes.rb', line 8

def self.class_from_resource rsrc
  # This method has been profiled, so don't go making it more elegant unless you're doing same.
  klass_name = rsrc.to_s
  return RESOURCE_CLASS_MAP[klass_name] if RESOURCE_CLASS_MAP.include?(klass_name)
  # kill off all but the non-modularized class name and camelize
  klass_name.gsub!(/(?:^|_)(.)/){ $1.upcase }
  begin
    # convert it to class name
    klass = klass_name.constantize
  rescue Exception => e
    warn "Bogus class name '#{klass_name}'? #{e}"
    klass = nil
  end
  RESOURCE_CLASS_MAP[klass_name] = klass
end

.decode_str(str, strategy = :xml) ⇒ Object

Decode string from its encode_str representation. This can include dangerous things such as tabs, newlines, backslashes and cryptofascist propaganda.



69
70
71
72
73
74
75
# File 'lib/wukong/encoding.rb', line 69

def self.decode_str str, strategy=:xml
  case strategy
  when :xml        then self.html_encoder.decode(str)
  when :url        then Addressable::URI.unencode_component(str)
  else raise "Don't know how to decode with strategy #{strategy}"
  end
end

.encode_components(hsh, *fields) ⇒ Object

Replace each given field in the hash with its encoded value



81
82
83
84
85
# File 'lib/wukong/encoding.rb', line 81

def self.encode_components hsh, *fields
  fields.each do |field|
    hsh[field] = hsh[field].to_s.wukong_encode if hsh[field]
  end
end

.encode_str(str, strategy = :xml) ⇒ Object

By default (or explicitly with the :xml strategy), convert string to

  • XML-encoded ASCII,

  • with a guarantee that the characters “ quote, ‘ apos \ backslash, carriage-return r newline n and tab t (as well as all other control characters) are encoded.

  • Any XML-encoding in the original text is encoded with no introspection:

    encode_str("<a href=\"foo\">")
    # => "<a href="foo">"
    
  • Useful: rishida.net/scripts/uniview/conversion.php

With the :url strategy,

Wukong.decode_str(Wukong.encode_str(str)) returns the original str

If you’re seeing bad_encoding errors, try

$KCODE='u' unless "1.9".respond_to?(:encoding)

at the start of your script.



48
49
50
51
52
53
54
55
56
57
58
# File 'lib/wukong/encoding.rb', line 48

def self.encode_str str, strategy=:xml
  begin
    case strategy
    when :xml        then self.html_encoder.encode(str, :basic, :named, :decimal).gsub(/\\/, '\')
    when :url        then Addressable::URI.encode_component(str, /[^\w]/)
    else raise "Don't know how to encode with strategy #{strategy}"
    end
  rescue ArgumentError => e
    '!bad_encoding!! ' + str.gsub(/[^\w\s\.\-@#%]+/, '')
  end
end

.html_encoderObject

HTMLEntities encoder instance



60
61
62
# File 'lib/wukong/encoding.rb', line 60

def self.html_encoder
  @html_encoder ||= HTMLEntities.new
end

.loggerObject

Common logger

Set your own at any time with

Wukong.logger = YourAwesomeLogger.new(...)

If you have log4r installed you can use

Wukong.logger = Wukong.default_log4r_logger

If Wukong.logger is too much typing for you, use the Log constant

Default format:

I, [2009-07-26T19:58:46-05:00 #12332]: Up to 2000 char message


15
16
17
18
19
20
21
22
23
24
25
# File 'lib/wukong/logger.rb', line 15

def self.logger
  return @logger if defined?(@logger)
  require 'logger'
  @logger = Logger.new STDERR
  @logger.instance_eval do
    def dump *args
      debug args.inspect
    end
  end
  @logger
end

.logger=(logger) ⇒ Object



27
28
29
# File 'lib/wukong/logger.rb', line 27

def self.logger= logger
  @logger = logger
end

.run(mapper, reducer = nil, options = {}) ⇒ Object



15
16
17
# File 'lib/wukong.rb', line 15

def self.run mapper, reducer=nil, options={}
  Wukong::Script.new(mapper, reducer, options).run
end