Class: Wukong::Processor::Extract

Inherits:
Wukong::Processor show all
Includes:
DynamicGet
Defined in:
lib/wukong/widget/extract.rb

Overview

A widget that extracts parts of incoming records.

This widget can extract part of the following kinds of objects:

  • Hash
  • Array
  • JSON string
  • delimited string ("\t" or "," or other)
  • models

In each case it will attempt to appropriately parse its :part argument.

This even works on nested keys using a dot ('.') to separate the keys:

Objects like Hashes, Arrays, and models, which would have to serialize within a command-line flow, can also be extracted from within a dataflow:

Examples:

Extracting a column from an input TSV record on the command-line


$ cat input
snap	crackle	pop
1	2	3
$ cat input | wu-local extract --part=2
crackle
pop

Extracting a column from delimited data with a different delimiter


$ cat input
snap,crackle,pop
1,2,3
$ cat input | wu-local extract --part=2 --delimiter=,
crackle
pop

Extracting a field from within some JSON record on the command-line


$ cat input
{"id": 1, "text": "hi there"}
{"id": 2, "text": "goodbye"}
$ cat input | wu-local extract --part="text"
hi there
goodbye

Extracting a nested field from within some JSON record on the command-line


$ cat input
{"id": 1, {"data": {"text": "hi there"}}
{"id": 2, {"data": {"text": "goodbye"}}
$ cat input | wu-local extract --part="data.text"
hi there
goodbye

Extracting a field from within a Hash in a dataflow


Wukong.dataflow(:uses_extract) do
  ... | extract(part: 'data.text') | ...
end

See Also:

Constant Summary

Constants inherited from Wukong::Processor

SerializerError

Instance Attribute Summary

Attributes included from Hanuman::StageInstanceMethods

#graph

Instance Method Summary collapse

Methods included from DynamicGet

#get, #get_nested, included

Methods inherited from Wukong::Processor

configure, description, #finalize, #perform_action, #receive_action, #setup, #stop

Methods included from Logging

included

Methods inherited from Hanuman::Stage

#clone

Methods included from Hanuman::StageClassMethods

#builder, #label, #register, #set_builder

Methods included from Hanuman::StageInstanceMethods

#add_stage_link, #linkable_name, #root

Instance Method Details

#process(record) {|part| ... } ⇒ Object

Extract a part of a record.

Parameters:

  • record (Object)

Yields:

  • (part)

Yield Parameters:

  • part (Object)

    the part extracted from the record



116
117
118
# File 'lib/wukong/widget/extract.rb', line 116

def process record
  yield get(self.part, record)
end