Class: Wukong::Processor::Group

Inherits:
Count show all
Includes:
DynamicGet
Defined in:
lib/wukong/widget/reducers/group.rb

Overview

Groups sorted input records and emits each group with a count.

Allows you to use several ways of extracting the key that defines the group.

Note: The input records must be previously sorted by the same key used for grouping in order to ensure that groups are not split up.

A group fits nicely at the end of a dataflow. Since it requires a sort, it is blocking.

Examples:

Group simple string values on the command-line.


$ cat input
apple
cat
banana
apple
...
$ cat input | wu-local sort | wu-local group --to=tsv
apple	4
banana	2
cat	5
...

Group using a nested key within a JSON string on the command-line


$ cat input
{"id": 1, "word": "apple" }
{"id": 2, "word": "cat"   }
{"id": 3, "word": "banana"}
...
$ cat input | wu-local sort --on=word | wu-local group --by=word --to=tsv
apple	4
banana	2
cat	5
...

Using a group at the end of a dataflow


Wukong.dataflow(:makes_groups) do
  ... | sort(on: 'field') | group(by: 'field') | to_tsv
end

See Also:

Direct Known Subclasses

GroupConcat, Moments

Constant Summary

Constants inherited from Wukong::Processor

SerializerError

Instance Attribute Summary

Attributes inherited from Count

#size

Attributes inherited from Accumulator

#group, #key

Instance Method Summary collapse

Methods included from DynamicGet

#get, #get_nested, included

Methods inherited from Count

#accumulate, #setup

Methods inherited from Accumulator

#accumulate, #process, #setup

Methods inherited from Wukong::Processor

configure, consumes, description, #expected_record_type, #expected_serialization, #perform_action, #process, produces, #receive_action, #setup, #stop, valid_serializer?, validate_and_set_serialization

Methods included from Logging

included

Methods included from Hanuman::StageClassMethods

#builder, #label, #register, #set_builder

Instance Method Details

#finalize {|key, size| ... } ⇒ Object

Yields the current group along with its size

Yields:

Yield Parameters:

  • key (Object)

    the key defining the group

  • size (Integer)

    the size of the group



121
122
123
# File 'lib/wukong/widget/reducers/group.rb', line 121

def finalize
  yield [key, size]
end

#get_key(record) ⇒ Object

Get the key which defines the group for this record.

Parameters:

  • record (Object)

Returns:

  • (Object)


105
106
107
# File 'lib/wukong/widget/reducers/group.rb', line 105

def get_key(record)
  get(self.by, record)
end

#start(record) ⇒ Object

Reset the size counter for new group.

Parameters:

  • record (Object)


112
113
114
# File 'lib/wukong/widget/reducers/group.rb', line 112

def start record
  self.size = 0
end