Class: Wukong::Processor::Bin
- Inherits:
-
Accumulator
- Object
- Hanuman::Stage
- Wukong::Processor
- Accumulator
- Wukong::Processor::Bin
- Includes:
- DynamicGet
- Defined in:
- lib/wukong/widget/reducers/bin.rb
Overview
A widget for binning input data. Will emit
This widget works nicely with the Extract widget at the end of a data flow:
Constant Summary
Constants inherited from Wukong::Processor
Instance Attribute Summary collapse
-
#bins ⇒ Object
The bins (pairs of edges).
-
#counts ⇒ Object
The value counts within each bin.
-
#total_count ⇒ Object
The total number of accumulated values.
-
#values ⇒ Object
The accumulated values.
Attributes inherited from Accumulator
Attributes included from Hanuman::StageInstanceMethods
Instance Method Summary collapse
-
#accumulate(record) ⇒ Object
Accumulates a single
record
. -
#bin! ⇒ Object
Bins the accumulated values.
-
#bins? ⇒ true, false
Does this widget have a populated list of bins?.
-
#finalize {|lower, upper, count, normalized_count| ... } ⇒ Object
Emits each bin with its edges and count.
-
#format(n) ⇒ String
Formats
n
so it's readable and compact. -
#get_key(record) ⇒ :__first__group__
Keep all records in the same "group", at least from the Accumulator's perspective.
-
#log_count_if_necessary(val) ⇒ Float
Returns
val
, taking a logarithm to the appropriate base if required. -
#log_if_possible(val) ⇒ Float
Returns the logarithm of the given
val
if possible. -
#setup ⇒ Object
Initializes all storage.
-
#value_from(record) ⇒ Float?
Get a value from a given
record
.
Methods included from DynamicGet
Methods inherited from Accumulator
Methods inherited from Wukong::Processor
configure, description, #perform_action, #process, #receive_action, #stop
Methods included from Logging
Methods inherited from Hanuman::Stage
Methods included from Hanuman::StageClassMethods
#builder, #label, #register, #set_builder
Methods included from Hanuman::StageInstanceMethods
#add_link, #linkable_name, #root
Instance Attribute Details
#bins ⇒ Object
The bins (pairs of edges)
129 130 131 |
# File 'lib/wukong/widget/reducers/bin.rb', line 129 def bins @bins end |
#counts ⇒ Object
The value counts within each bin.
132 133 134 |
# File 'lib/wukong/widget/reducers/bin.rb', line 132 def counts @counts end |
#total_count ⇒ Object
The total number of accumulated values.
135 136 137 |
# File 'lib/wukong/widget/reducers/bin.rb', line 135 def total_count @total_count end |
#values ⇒ Object
The accumulated values
126 127 128 |
# File 'lib/wukong/widget/reducers/bin.rb', line 126 def values @values end |
Instance Method Details
#accumulate(record) ⇒ Object
Accumulates a single record
.
First we extract the value from the record. If we already
have bins, add the value to the appropriate bin. Otherwise,
store the value, updating any properties like max
or min
as necessary.
169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/wukong/widget/reducers/bin.rb', line 169 def accumulate record value = (value_from(record) or return) self.total_count += 1 if bins? add_to_some_bin(value) else self.min ||= value self.min = value if value < min self.max ||= value self.max = value if value > max self.values << value end end |
#bin! ⇒ Object
Bins the accumulated values.
233 234 235 236 237 238 239 240 |
# File 'lib/wukong/widget/reducers/bin.rb', line 233 def bin! set_num_bins_from_total_count! unless self.num_bins set_edges_from_min_max_and_num_bins! until values.empty? value = values.shift add_to_some_bin(value.to_f) if value end end |
#bins? ⇒ true, false
Does this widget have a populated list of bins?
245 246 247 |
# File 'lib/wukong/widget/reducers/bin.rb', line 245 def bins? bins && (! bins.empty?) end |
#finalize {|lower, upper, count, normalized_count| ... } ⇒ Object
Emits each bin with its edges and count. Adds the normalized count if requested.
Will bins the values if we haven't done so on the fly already.
193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/wukong/widget/reducers/bin.rb', line 193 def finalize bin! unless bins? counts.each_with_index do |count, index| bin = bins[index] bin << log_count_if_necessary(count) if normalize && total_count > 0 bin << log_count_if_necessary((count.to_f / total_count.to_f)) end yield bin.map { |n| format(n) } end end |
#format(n) ⇒ String
Formats n
so it's readable and compact.
If this widget is given an explicit format_string
then it
will be used here (the value of format_string
should have a
slot for a float).
Otherwise, large (or small) numbers will be formatted in
scientific notation while "medium numbers" (0.001 < |n| <
1000) are merely printed, all with the given precision
.
217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/wukong/widget/reducers/bin.rb', line 217 def format n case when format_string format_string % n when n == 0.0 '0.0' when n.abs > 1000 || n.abs < 0.001 "%#{precision}.#{precision}E" % n else "%#{precision}.#{precision}f" % n end end |
#get_key(record) ⇒ :__first__group__
Keep all records in the same "group", at least from the Accumulator's perspective.
157 158 159 |
# File 'lib/wukong/widget/reducers/bin.rb', line 157 def get_key record :__first__group__ end |
#log_count_if_necessary(val) ⇒ Float
Returns val
, taking a logarithm to the appropriate base if
required.
264 265 266 |
# File 'lib/wukong/widget/reducers/bin.rb', line 264 def log_count_if_necessary val log_counts ? log_if_possible(val) : val end |
#log_if_possible(val) ⇒ Float
Returns the logarithm of the given val
if possible.
Will return the original value if negative.
274 275 276 |
# File 'lib/wukong/widget/reducers/bin.rb', line 274 def log_if_possible val val > 0 ? Math.log(val, base) : val end |
#setup ⇒ Object
Initializes all storage. If we can calculate bins in advance, do so now.
139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/wukong/widget/reducers/bin.rb', line 139 def setup super() self.values = [] self.bins = [] self.counts = [] self.total_count = 0 if edges.nil? set_edges_from_min_max_and_num_bins! if min && max && num_bins else set_bins_and_counts_from_edges! end end |
#value_from(record) ⇒ Float?
Get a value from a given record
.
253 254 255 256 257 |
# File 'lib/wukong/widget/reducers/bin.rb', line 253 def value_from record val = get(self.by, record) return unless val val.to_f rescue nil end |