Module: Enumerable

Included in:
FixedRange
Defined in:
lib/just_enumerable_stats.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#_jes_range_class_argsObject (readonly)

The arguments needed to instantiate the custom-defined range class.



305
306
307
# File 'lib/just_enumerable_stats.rb', line 305

def _jes_range_class_args
  @_jes_range_class_args
end

#_jes_range_hashObject (readonly)

The hash of lambdas that are used to categorize the enumerable.



301
302
303
# File 'lib/just_enumerable_stats.rb', line 301

def _jes_range_hash
  @_jes_range_hash
end

Class Method Details

.safe_alias(sym1, sym2 = nil) ⇒ Object

Defines the new methods unobtrusively.



50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/just_enumerable_stats.rb', line 50

def self.safe_alias(sym1, sym2=nil)

  return false if not sym2 and not sym1.to_s.match(/^_jes_/)
  
  if sym2
    old_meth = sym2
    new_meth = sym1
  else
    old_meth = sym1
    new_meth = sym1.to_s.sub(/^_jes_/, '').to_sym
    return false if self.class.respond_to?(new_meth)
  end
  alias_method new_meth, old_meth
end

Instance Method Details

#_jes_average(&block) ⇒ Object

The arithmetic mean, uses a block or default block.



138
139
140
# File 'lib/just_enumerable_stats.rb', line 138

def _jes_average(&block)
  _jes_sum(&block)/size
end

#_jes_cartesian_product(other, &block) ⇒ Object

Finds the cartesian product, excluding duplicates items and self- referential pairs. Yields the block value if given.



581
582
583
584
585
586
587
588
# File 'lib/just_enumerable_stats.rb', line 581

def _jes_cartesian_product(other, &block)
  x,y = self.uniq.dup, other.uniq.dup
  pairs = x.inject([]) do |cp, i|
    cp | y.map{|b| i == b ? nil : [i,b]}.compact
  end
  return pairs unless block_given?
  pairs.map{|p| yield p.first, p.last}
end

#_jes_categoriesObject

Takes the range_class and returns its map. Example: require ‘mathn’ a = [1,2,3] a range_class = FixedRange, a.min, a.max, 1/4 a.categories

> [1, 5/4, 3/2, 7/4, 2, 9/4, 5/2, 11/4, 3]

For non-numeric values, returns a unique set, ordered if possible.



253
254
255
256
257
258
259
260
261
# File 'lib/just_enumerable_stats.rb', line 253

def _jes_categories
  if @_jes_categories
    @_jes_categories
  elsif self._jes_is_numeric?
    self._jes_range_instance.map
  else
    self.uniq.sort rescue self.uniq
  end
end

#_jes_category_values(reset = false) ⇒ Object

Returns a Hash or Dictionary (if available) for each category with a value as the set of matching values as an array. Because this is supposed to be lean (just enumerables), but this is an expensive call, I’m going to cache it and offer a parameter to reset the cache. So, call category_values(true) if you need to reset the cache.



335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# File 'lib/just_enumerable_stats.rb', line 335

def _jes_category_values(reset=false)
  @_jes_category_values = nil if reset
  return @_jes_category_values if @_jes_category_values
  container = defined?(Dictionary) ? Dictionary.new : Hash.new
  if self.range_hash
    @_jes_category_values = self._jes_categories.inject(container) do |cont, cat|
      cont[cat] = self.find_all &self._jes_range_hash[cat]
      cont
    end
  else
    @_jes_category_values = self._jes_categories.inject(container) do |cont, cat|
      cont[cat] = self.find_all {|e| e == cat}
      cont
    end
  end
end

#_jes_compliment(other) ⇒ Object

Everything on the left hand side except what’s shared on the right hand side. “The relative compliment of y in x”



568
569
570
# File 'lib/just_enumerable_stats.rb', line 568

def _jes_compliment(other)
  self - other
end

#_jes_correlation(other) ⇒ Object

Finds the correlation between two enumerables. Example: [1,2,3].cor [2,3,5] returns 0.981980506061966



630
631
632
633
634
635
636
637
638
639
640
641
642
643
# File 'lib/just_enumerable_stats.rb', line 630

def _jes_correlation(other)
  n = [self.size, other.size]._jes_min
  sum_of_products_of_pairs = self._jes_sigma_pairs(other) {|a, b| a * b}
  self_sum = self._jes_sum
  other_sum = other._jes_sum
  sum_of_squared_self_scores = self._jes_sum { |e| e * e }
  sum_of_squared_other_scores = other._jes_sum { |e| e * e }
  
  numerator = (n * sum_of_products_of_pairs) - (self_sum * other_sum)
  self_denominator = ((n * sum_of_squared_self_scores) - (self_sum ** 2))
  other_denominator = ((n * sum_of_squared_other_scores) - (other_sum ** 2))
  denominator = Math.sqrt(self_denominator * other_denominator)
  return numerator / denominator
end

#_jes_count_if(&block) ⇒ Object

Counts each element where the block evaluates to true Example: a = [1,2,3] a.count_if {|e| e % 2 == 0}



321
322
323
324
325
326
# File 'lib/just_enumerable_stats.rb', line 321

def _jes_count_if(&block)
  self.inject(0) do |s, e|
    s += 1 if block.call(e)
    s
  end
end

#_jes_covariance(other) ⇒ Object

Returns the covariance of two lists.



674
675
676
677
678
679
680
681
682
# File 'lib/just_enumerable_stats.rb', line 674

def _jes_covariance(other)
  self._jes_to_f!
  other._jes_to_f!
  n = [self.size, other.size]._jes_min
  self_average = self._jes_average
  other_average = other._jes_average
  total_expected = self._jes_sigma_pairs(other) {|a, b| (a - self_average) * (b - other_average)}
  total_expected / n
end

#_jes_cum_max(&block) ⇒ Object

Example:

1,2,3,0,5].cum_max # => [1,2,3,3,5


499
500
501
502
503
504
# File 'lib/just_enumerable_stats.rb', line 499

def _jes_cum_max(&block)
  _jes_morph_list(&block).inject([]) do |list, e|
    found = (list | [e])._jes_max
    list << (found ? found : e)
  end
end

#_jes_cum_min(&block) ⇒ Object

Example:

1,2,3,0,5].cum_min # => [1,1,1,0,0


510
511
512
513
514
515
# File 'lib/just_enumerable_stats.rb', line 510

def _jes_cum_min(&block)
    _jes_morph_list(&block).inject([]) do |list, e|
    found = (list | [e]).min
    list << (found ? found : e)
  end
end

#_jes_cum_prod(sorted = false, &block) ⇒ Object

The cummulative product. Example:

1,2,3].cum_prod # => [1.0, 2.0, 6.0


471
472
473
474
475
476
477
478
479
480
481
# File 'lib/just_enumerable_stats.rb', line 471

def _jes_cum_prod(sorted=false, &block)
  prod = _jes_one
  obj = sorted ? self.sort : self
  if block_given?
    obj.map { |i| prod *= yield(i) }
  elsif _jes_default_block
    obj.map { |i| prod *= _jes_default_block[*i] }
  else
    obj.map { |i| prod *= i }
  end
end

#_jes_cum_sum(sorted = false, &block) ⇒ Object

The cummulative sum. Example:

1,2,3].cum_sum # => [1, 3, 6


455
456
457
458
459
460
461
462
463
464
465
# File 'lib/just_enumerable_stats.rb', line 455

def _jes_cum_sum(sorted=false, &block)
  sum = _jes_zero
  obj = sorted ? self.sort : self
  if block_given?
    obj.map { |i| sum += yield(i) }
  elsif _jes_default_block
    obj.map { |i| sum += _jes_default_block[*i] }
  else
    obj.map { |i| sum += i }
  end
end

#_jes_default_blockObject

The block called to filter the values in the object.



96
97
98
# File 'lib/just_enumerable_stats.rb', line 96

def _jes_default_block
  @_jes_default_stat_block 
end

#_jes_default_block=(block) ⇒ Object

Allows me to setup a block for a series of operations. Example: a = [1,2,3] a.sum # => 6.0 a.default_block = lambda{|e| 1 / e} a.sum # => 1.0



106
107
108
# File 'lib/just_enumerable_stats.rb', line 106

def _jes_default_block=(block)
  @_jes_default_stat_block = block
end

#_jes_dichotomize(split_value, first_label, second_label) ⇒ Object

Splits the values in two, <= the value and > the value.



309
310
311
312
313
314
# File 'lib/just_enumerable_stats.rb', line 309

def _jes_dichotomize(split_value, first_label, second_label)
  container = defined?(Dictionary) ? Dictionary.new : Hash.new
  container[first_label] = lambda{|e| e <= split_value}
  container[second_label] = lambda{|e| e > split_value}
  _jes_set_range(container)
end

#_jes_euclidian_distance(other) ⇒ Object

Returns the Euclidian distance between all points of a set of enumerables



602
603
604
# File 'lib/just_enumerable_stats.rb', line 602

def _jes_euclidian_distance(other)
  Math.sqrt(self._jes_sigma_pairs(other) {|a, b| (a - b) ** 2})
end

#_jes_exclusive_not(other) ⇒ Object

Everything but what’s shared



574
575
576
# File 'lib/just_enumerable_stats.rb', line 574

def _jes_exclusive_not(other)
  (self | other) - (self & other)
end

#_jes_intersect(other) ⇒ Object

What’s shared on the left and right hand sides “The intersection of x and y”



560
561
562
# File 'lib/just_enumerable_stats.rb', line 560

def _jes_intersect(other)
  self & other
end

#_jes_is_numeric?Boolean

Returns:

  • (Boolean)


264
265
266
# File 'lib/just_enumerable_stats.rb', line 264

def _jes_is_numeric?
  self.all? {|e| e.is_a?(Numeric)}
end

#_jes_max(&block) ⇒ Object

Returns the max, using an optional block.



66
67
68
69
70
71
# File 'lib/just_enumerable_stats.rb', line 66

def _jes_max(&block)
  self.inject do |best, e|
    val = _jes_block_sorter(best, e, &block)
    best = val > 0 ? best : e
  end
end

#_jes_max_index(&block) ⇒ Object

Returns the first index of the max value



75
76
77
# File 'lib/just_enumerable_stats.rb', line 75

def _jes_max_index(&block)
  self.index(_jes_max(&block))
end

#_jes_max_of_lists(*enums) ⇒ Object

Returns the max of two or more enumerables. >> [1,2,3].max_of_lists(, [0,2,9])

> [1, 5, 9]



660
661
662
# File 'lib/just_enumerable_stats.rb', line 660

def _jes_max_of_lists(*enums)
  _jes_yield_transpose(*enums) {|e| e._jes_max}
end

#_jes_median(ratio = 0.5, &block) ⇒ Object

The slow way is to iterate up to the middle point. A faster way is to use the index, when available. If a block is supplied, always iterate to the middle point.



170
171
172
173
174
175
176
177
178
179
180
181
# File 'lib/just_enumerable_stats.rb', line 170

def _jes_median(ratio=0.5, &block)
  return _jes_iterate_midway(ratio, &block) if block_given?
  begin
    mid1, mid2 = _jes_middle_two
    sorted = sort
    med1, med2 = sorted[mid1], sorted[mid2]
    return med1 if med1 == med2
    return med1 + ((med2 - med1) * ratio)
  rescue
    _jes_iterate_midway(ratio, &block)
  end
end

#_jes_min(&block) ⇒ Object

Min of any number of items



81
82
83
84
85
86
# File 'lib/just_enumerable_stats.rb', line 81

def _jes_min(&block)
  self.inject do |best, e|
    val = _jes_block_sorter(best, e, &block)
    best = val < 0 ? best : e
  end
end

#_jes_min_index(&block) ⇒ Object

Returns the first index of the min value



90
91
92
# File 'lib/just_enumerable_stats.rb', line 90

def _jes_min_index(&block)
  self.index(_jes_min(&block))
end

#_jes_min_of_lists(*enums) ⇒ Object

Returns the min of two or more enumerables. >> [1,2,3].min_of_lists(, [0,2,9])

> [0, 2, 3]



668
669
670
# File 'lib/just_enumerable_stats.rb', line 668

def _jes_min_of_lists(*enums)
  _jes_yield_transpose(*enums) {|e| e.min}
end

#_jes_new_sort(&block) ⇒ Object

I don’t pass the block to the sort, because a sort block needs to look something like: {|x,y| x <=> y}. To get around this, set the default block on the object.



375
376
377
378
379
380
381
382
383
# File 'lib/just_enumerable_stats.rb', line 375

def _jes_new_sort(&block)
  if block_given?
    map { |i| yield(i) }.sort.dup
  elsif _jes_default_block
    map { |i| _jes_default_block[*i] }.sort.dup
  else
    sort().dup
  end
end

#_jes_normalizeObject



729
730
731
# File 'lib/just_enumerable_stats.rb', line 729

def _jes_normalize
  self.map {|e| e.to_f / self._jes_sum }
end

#_jes_normalize!Object



734
735
736
737
# File 'lib/just_enumerable_stats.rb', line 734

def _jes_normalize!
  sum = self._jes_sum
  self.map! {|e| e.to_f / sum }
end

#_jes_order(&block) ⇒ Object

Given values like [10,5,5,1] Rank should produce something like [4,2,2,1] And order should produce something like [4,2,3,1] The trick is that rank skips as many as were duplicated, so there could not be a 3 in the rank from the example above.



411
412
413
414
415
416
417
418
419
420
# File 'lib/just_enumerable_stats.rb', line 411

def _jes_order(&block)
  hold = []
  _jes_rank(&block).each do |x|
    while hold.include?(x) do
      x += 1
    end
    hold << x
  end
  hold
end

#_jes_pearson_correlation(other) ⇒ Object

The covariance / product of standard deviations en.wikipedia.org/wiki/Correlation



687
688
689
690
691
692
# File 'lib/just_enumerable_stats.rb', line 687

def _jes_pearson_correlation(other)
  self._jes_to_f!
  other._jes_to_f!
  denominator = self._jes_standard_deviation * other._jes_standard_deviation
  self._jes_covariance(other) / denominator
end

#_jes_productObject

Multiplies the values: >> product(1,2,3)

> 6.0



522
523
524
# File 'lib/just_enumerable_stats.rb', line 522

def _jes_product
  self.inject(_jes_one) {|sum, a| sum *= a}
end

#_jes_quantile(&block) ⇒ Object

First quartile: nth_split_by_m(1, 4) Third quartile: nth_split_by_m(3, 4) Median: nth_split_by_m(1, 2) Doesn’t match R, and it’s silly to try to. def _jes_nth_split_by_m(n, m)

sorted  = new_sort
dividers = m - 1
if size % m == dividers # Divides evenly
  # Because we have a 0-based list, we get the floor
  i = ((size / m.to_f) * n).floor
  j = i
else
  # This reflects R's approach, which I don't think I agree with.
  i = (((size / m.to_f) * n) - 1)
  i = i > (size / m.to_f) ? i.floor : i.ceil
  j = i + 1
end
sorted[i] + ((n / m.to_f) * (sorted[j] - sorted[i]))

end



442
443
444
445
446
447
448
449
450
# File 'lib/just_enumerable_stats.rb', line 442

def _jes_quantile(&block)
  [
    _jes_min(&block), 
    _jes_first_half(&block)._jes_median(0.25, &block), 
    _jes_median(&block), 
    _jes_second_half(&block)._jes_median(0.75, &block), 
    _jes_max(&block)
  ]
end

#_jes_rand_in_range(*args) ⇒ Object

Returns a random integer in the range for any number of lists. This is a way to get a random vector that is tenable based on the sample data. For example, given two sets of numbers:

a = [1,2,3]; b = [8,8,8]

rand_in_pair_range will return a value >= 1 and <= 8 in the first place, >= 2 and <= 8 in the second place, and >= 3 and <= 8 in the last place. Works for integers. Rethink this for floats. May consider setting up FixedRange for floats. O(n*5)



618
619
620
621
622
623
624
# File 'lib/just_enumerable_stats.rb', line 618

def _jes_rand_in_range(*args)
  min = self._jes_min_of_lists(*args)
  max = self._jes_max_of_lists(*args)
  (0...size).inject([]) do |ary, i|
    ary << rand_between(min[i], max[i])
  end
end

#_jes_range(&block) ⇒ Object

Just an array of [min, max] to comply with R uses of the work. Use range_as_range if you want a real Range.



271
272
273
# File 'lib/just_enumerable_stats.rb', line 271

def _jes_range(&block)
  [_jes_min(&block), _jes_max(&block)]
end

#_jes_range_as_range(&block) ⇒ Object

Actually instantiates the range, instead of producing a min and max array.



361
362
363
364
365
366
367
# File 'lib/just_enumerable_stats.rb', line 361

def _jes_range_as_range(&block)
  if @_jes_range_class_args and not @_jes_range_class_args.empty?
    self._jes_range_class.new(*@_jes_range_class_args)
  else
    self._jes_range_class.new(_jes_min(&block), _jes_max(&block))
  end
end

#_jes_range_classObject

When creating a range, what class will it be? Defaults to Range, but other classes are sometimes useful.



355
356
357
# File 'lib/just_enumerable_stats.rb', line 355

def _jes_range_class
  @_jes_range_class ||= Range
end

#_jes_rank(&block) ⇒ Object

Ranks the values



387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
# File 'lib/just_enumerable_stats.rb', line 387

def _jes_rank(&block)

  sorted = _jes_new_sort(&block)
  # rank = map { |i| sorted.index(i) + 1 }

  if block_given?
    map { |i| sorted.index(yield(i)) + 1 }
  elsif _jes_default_block
    map { |i| 
      sorted.index(_jes_default_block[*i]) + 1 }
  else
    map { |i| sorted.index(i) + 1 }
  end

end

#_jes_scale(val = nil, &block) ⇒ Object



706
707
708
709
710
711
712
# File 'lib/just_enumerable_stats.rb', line 706

def _jes_scale(val=nil, &block)
  if block
    self.map{|e| block.call(e)}
  else
    self.map{|e| e * val}
  end
end

#_jes_scale!(val = nil, &block) ⇒ Object



715
716
717
718
719
720
721
# File 'lib/just_enumerable_stats.rb', line 715

def _jes_scale!(val=nil, &block)
  if block
    self.map!{|e| block.call(e)}
  else
    self.map!{|e| e * val}
  end
end

#_jes_scale_between(*values) ⇒ Object

Raises:

  • (ArgumentError)


740
741
742
743
744
745
746
747
748
749
# File 'lib/just_enumerable_stats.rb', line 740

def _jes_scale_between(*values)
  raise ArgumentError, "Must provide two values" unless values.size == 2
  values.sort!
  min = values[0]
  max = values[1]
  orig_min = self._jes_min
  scalar = (max - min) / (self._jes_max - orig_min).to_f
  shift = min - (orig_min * scalar)
  self._jes_scale{|e| (e * scalar) + shift}
end

#_jes_scale_between!(*values) ⇒ Object

Raises:

  • (ArgumentError)


752
753
754
755
756
757
758
759
760
761
# File 'lib/just_enumerable_stats.rb', line 752

def _jes_scale_between!(*values)
  raise ArgumentError, "Must provide two values" unless values.size == 2
  values.sort!
  min = values[0]
  max = values[1]
  orig_min = self._jes_min
  scalar = (max - min) / (self._jes_max - orig_min).to_f
  shift = min - (orig_min * scalar)
  self._jes_scale!{|e| (e * scalar) + shift}
end

#_jes_scale_to_sigmoid!Object



724
725
726
# File 'lib/just_enumerable_stats.rb', line 724

def _jes_scale_to_sigmoid!
  self._jes_scale! { |e| 1 / (1 + Math.exp( -1 * (e))) }
end

#_jes_set_range(hash) ⇒ Object

Takes a hash of arrays for categories If Facets happens to be loaded on the computer, this keeps the order of the categories straight.



287
288
289
290
291
292
293
294
295
296
297
# File 'lib/just_enumerable_stats.rb', line 287

def _jes_set_range(hash)
  if defined?(Dictionary)
    @_jes_range_hash = Dictionary.new
    @_jes_range_hash.merge!(hash)
    @_jes_categories = @_jes_range_hash.keys
  else
    @_jes_categories = hash.keys
    @_jes_range_hash = hash
  end
  @_jes_categories
end

#_jes_set_range_class(klass, *args) ⇒ Object

Useful for setting a real range class (FixedRange).



277
278
279
280
281
# File 'lib/just_enumerable_stats.rb', line 277

def _jes_set_range_class(klass, *args)
  @_jes_range_class = klass
  @_jes_range_class_args = args
  self._jes_range_class
end

#_jes_sigma_pairs(other, z = _jes_zero, &block) ⇒ Object

Sigma of pairs. Returns a single float, or whatever object is sent in. Example: [1,2,3].sigma_pairs(, 0) {|x, y| x + y} returns 21 instead of 21.0.



596
597
598
# File 'lib/just_enumerable_stats.rb', line 596

def _jes_sigma_pairs(other, z=_jes_zero, &block)
  self._jes_to_pairs(other,&block).inject(z) {|sum, i| sum += i}
end

#_jes_standard_deviation(&block) ⇒ Object

The standard deviation. Uses a block or default block.



161
162
163
# File 'lib/just_enumerable_stats.rb', line 161

def _jes_standard_deviation(&block)
  Math::sqrt(_jes_variance(&block))
end

#_jes_sumObject

Adds up the list. Uses a block or default block if present.



124
125
126
127
128
129
130
131
132
133
134
# File 'lib/just_enumerable_stats.rb', line 124

def _jes_sum
  sum = _jes_zero
  if block_given?
    each{|i| sum += yield(i)}
  elsif _jes_default_block
    each{|i| sum += _jes_default_block[*i]}
  else
    each{|i| sum += i}
  end
  sum
end

#_jes_tanimoto_pairs(other) ⇒ Object

Finds the tanimoto coefficient: the intersection set size / union set size. This is used to find the distance between two vectors. >> [1,2,3].cor()

> 0.981980506061966

>> [1,2,3].tanimoto_pairs()

> 0.5



541
542
543
# File 'lib/just_enumerable_stats.rb', line 541

def _jes_tanimoto_pairs(other)
  _jes_intersect(other).size / _jes_union(other).size.to_f
end

#_jes_to_f!Object

Some calculations have to have at least floating point numbers. This generates a cached version of the operation–only runs once per object.



697
698
699
700
# File 'lib/just_enumerable_stats.rb', line 697

def _jes_to_f!
  return true if @_jes_to_f
  @_jes_to_f = self.map! {|e| e.to_f}
end

#_jes_to_pairs(other, &block) ⇒ Object

There are going to be a lot more of these kinds of things, so pay attention.



529
530
531
532
# File 'lib/just_enumerable_stats.rb', line 529

def _jes_to_pairs(other, &block)
  n = [self.size, other.size]._jes_min
  (0...n).map {|i| block.call(self[i], other[i]) }
end

#_jes_union(other) ⇒ Object

All of the left and right hand sides, excluding duplicates. “The union of x and y”



553
554
555
# File 'lib/just_enumerable_stats.rb', line 553

def _jes_union(other)
  self | other
end

#_jes_variance(&block) ⇒ Object

The variance, uses a block or default block.



146
147
148
149
150
151
152
153
154
155
156
# File 'lib/just_enumerable_stats.rb', line 146

def _jes_variance(&block)
  m = _jes_average(&block)
  sum_of_differences = if block_given?
    _jes_sum{ |i| j=yield(i); (m - j) ** 2 }
  elsif _jes_default_block
    _jes_sum{ |i| j=_jes_default_block[*i]; (m - j) ** 2 }
  else
    _jes_sum{ |i| (m - i) ** 2 }
  end
  sum_of_differences / (size - 1)
end

#_jes_yield_transpose(*enums, &block) ⇒ Object

Transposes arrays of arrays and yields a block on the value. The regular Array#transpose ignores blocks



649
650
651
652
653
654
# File 'lib/just_enumerable_stats.rb', line 649

def _jes_yield_transpose(*enums, &block)
  enums.unshift(self)
  n = enums.map{ |x| x.size}.min
  block ||= lambda{|e| e}
  (0...n).map { |i| block.call enums.map{ |x| x[i] } }
end