Module: Enumerable

Included in:
FixedRange
Defined in:
lib/just_enumerable_stats.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#range_class_argsObject (readonly)

The arguments needed to instantiate the custom-defined range class.



269
270
271
# File 'lib/just_enumerable_stats.rb', line 269

def range_class_args
  @range_class_args
end

#range_hashObject (readonly)

The hash of lambdas that are used to categorize the enumerable.



266
267
268
# File 'lib/just_enumerable_stats.rb', line 266

def range_hash
  @range_hash
end

Instance Method Details

#average(&block) ⇒ Object Also known as: mean, avg

The arithmetic mean, uses a block or default block.



113
114
115
# File 'lib/just_enumerable_stats.rb', line 113

def average(&block)
  sum(&block)/size
end

#cartesian_product(other, &block) ⇒ Object Also known as: cp, permutations

Finds the cartesian product, excluding duplicates items and self- referential pairs. Yields the block value if given.



511
512
513
514
515
516
517
518
# File 'lib/just_enumerable_stats.rb', line 511

def cartesian_product(other, &block)
  x,y = self.uniq.dup, other.uniq.dup
  pairs = x.inject([]) do |cp, i|
    cp | y.map{|b| i == b ? nil : [i,b]}.compact
  end
  return pairs unless block_given?
  pairs.map{|p| yield p.first, p.last}
end

#categoriesObject

Takes the range_class and returns its map. Example: require ‘mathn’ a = [1,2,3] a range_class = FixedRange, a.min, a.max, 1/4 a.categories

> [1, 5/4, 3/2, 7/4, 2, 9/4, 5/2, 11/4, 3]

For non-numeric values, returns a unique set, ordered if possible.



223
224
225
226
227
228
229
230
231
# File 'lib/just_enumerable_stats.rb', line 223

def categories
  if @categories
    @categories
  elsif self.is_numeric?
    self.range_instance.map
  else
    self.uniq.sort rescue self.uniq
  end
end

#category_values(reset = false) ⇒ Object

Returns a Hash or Dictionary (if available) for each category with a value as the set of matching values as an array. Because this is supposed to be lean (just enumerables), but this is an expensive call, I’m going to cache it and offer a parameter to reset the cache. So, call category_values(true) if you need to reset the cache.



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
# File 'lib/just_enumerable_stats.rb', line 288

def category_values(reset=false)
  @category_values = nil if reset
  return @category_values if @category_values
  container = defined?(Dictionary) ? Dictionary.new : Hash.new
  if self.range_hash
    @category_values = self.categories.inject(container) do |cont, cat|
      cont[cat] = self.find_all &self.range_hash[cat]
      cont
    end
  else
    @category_values = self.categories.inject(container) do |cont, cat|
      cont[cat] = self.find_all {|e| e == cat}
      cont
    end
  end
end

#compliment(other) ⇒ Object

Everything on the left hand side except what’s shared on the right hand side. “The relative compliment of y in x”



500
501
502
# File 'lib/just_enumerable_stats.rb', line 500

def compliment(other)
  self - other
end

#correlation(other) ⇒ Object Also known as: cor

Finds the correlation between two enumerables. Example: [1,2,3].cor [2,3,5] returns 0.981980506061966



556
557
558
559
560
561
562
563
564
565
566
567
568
569
# File 'lib/just_enumerable_stats.rb', line 556

def correlation(other)
  n = [self.size, other.size].min
  sum_of_products_of_pairs = self.sigma_pairs(other) {|a, b| a * b}
  self_sum = self.sum
  other_sum = other.sum
  sum_of_squared_self_scores = self.sum { |e| e * e }
  sum_of_squared_other_scores = other.sum { |e| e * e }
  
  numerator = (n * sum_of_products_of_pairs) - (self_sum * other_sum)
  self_denominator = ((n * sum_of_squared_self_scores) - (self_sum ** 2))
  other_denominator = ((n * sum_of_squared_other_scores) - (other_sum ** 2))
  denominator = Math.sqrt(self_denominator * other_denominator)
  return numerator / denominator
end

#count_if(&block) ⇒ Object

Counts each element where the block evaluates to true Example: a = [1,2,3] a.count_if {|e| e % 2 == 0}



275
276
277
278
279
280
# File 'lib/just_enumerable_stats.rb', line 275

def count_if(&block)
  self.inject(0) do |s, e|
    s += 1 if block.call(e)
    s
  end
end

#cum_max(&block) ⇒ Object Also known as: cumulative_max

Example:

1,2,3,0,5].cum_max # => [1,2,3,3,5


438
439
440
441
442
443
# File 'lib/just_enumerable_stats.rb', line 438

def cum_max(&block)
  morph_list(&block).inject([]) do |list, e|
    found = (list | [e]).max
    list << (found ? found : e)
  end
end

#cum_min(&block) ⇒ Object Also known as: cumulative_min

Example:

1,2,3,0,5].cum_min # => [1,1,1,0,0


448
449
450
451
452
453
# File 'lib/just_enumerable_stats.rb', line 448

def cum_min(&block)
  morph_list(&block).inject([]) do |list, e|
    found = (list | [e]).min
    list << (found ? found : e)
  end
end

#cum_prod(sorted = false, &block) ⇒ Object Also known as: cumulative_product

The cummulative product. Example:

1,2,3].cum_prod # => [1.0, 2.0, 6.0


411
412
413
414
415
416
417
418
419
420
421
# File 'lib/just_enumerable_stats.rb', line 411

def cum_prod(sorted=false, &block)
  prod = one
  obj = sorted ? self.new_sort : self
  if block_given?
    obj.map { |i| prod *= yield(i) }
  elsif default_block
    obj.map { |i| prod *= default_block[*i] }
  else
    obj.map { |i| prod *= i }
  end
end

#cum_sum(sorted = false, &block) ⇒ Object Also known as: cumulative_sum

The cummulative sum. Example:

1,2,3].cum_sum # => [1, 3, 6


396
397
398
399
400
401
402
403
404
405
406
# File 'lib/just_enumerable_stats.rb', line 396

def cum_sum(sorted=false, &block)
  sum = zero
  obj = sorted ? self.new_sort : self
  if block_given?
    obj.map { |i| sum += yield(i) }
  elsif default_block
    obj.map { |i| sum += default_block[*i] }
  else
    obj.map { |i| sum += i }
  end
end

#default_blockObject

The block called to filter the values in the object.



74
75
76
# File 'lib/just_enumerable_stats.rb', line 74

def default_block
  @default_stat_block 
end

#default_block=(block) ⇒ Object

Allows me to setup a block for a series of operations. Example: a = [1,2,3] a.sum # => 6.0 a.default_block = lambda{|e| 1 / e} a.sum # => 1.0



83
84
85
# File 'lib/just_enumerable_stats.rb', line 83

def default_block=(block)
  @default_stat_block = block
end

#euclidian_distance(other) ⇒ Object

Returns the Euclidian distance between all points of a set of enumerables



530
531
532
# File 'lib/just_enumerable_stats.rb', line 530

def euclidian_distance(other)
  Math.sqrt(self.sigma_pairs(other) {|a, b| (a - b) ** 2})
end

#exclusive_not(other) ⇒ Object

Everything but what’s shared



505
506
507
# File 'lib/just_enumerable_stats.rb', line 505

def exclusive_not(other)
  (self | other) - (self & other)
end

#intersect(other) ⇒ Object

What’s shared on the left and right hand sides “The intersection of x and y”



493
494
495
# File 'lib/just_enumerable_stats.rb', line 493

def intersect(other)
  self & other
end

#is_numeric?Boolean

Returns:

  • (Boolean)


233
234
235
# File 'lib/just_enumerable_stats.rb', line 233

def is_numeric?
  self.all? {|e| e.is_a?(Numeric)}
end

#max(&block) ⇒ Object

Returns the max, using an optional block.



48
49
50
51
52
53
# File 'lib/just_enumerable_stats.rb', line 48

def max(&block)
  self.inject do |best, e|
    val = block_sorter(best, e, &block)
    best = val > 0 ? best : e
  end
end

#max_index(&block) ⇒ Object

Returns the first index of the max value



56
57
58
# File 'lib/just_enumerable_stats.rb', line 56

def max_index(&block)
  self.index(max(&block))
end

#max_of_lists(*enums) ⇒ Object

Returns the max of two or more enumerables. >> [1,2,3].max_of_lists(, [0,2,9])

> [1, 5, 9]



584
585
586
# File 'lib/just_enumerable_stats.rb', line 584

def max_of_lists(*enums)
  yield_transpose(*enums) {|e| e.max}
end

#median(ratio = 0.5, &block) ⇒ Object

The slow way is to iterate up to the middle point. A faster way is to use the index, when available. If a block is supplied, always iterate to the middle point.



142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/just_enumerable_stats.rb', line 142

def median(ratio=0.5, &block)
  return iterate_midway(ratio, &block) if block_given?
  begin
    mid1, mid2 = middle_two
    sorted = new_sort
    med1, med2 = sorted[mid1], sorted[mid2]
    return med1 if med1 == med2
    return med1 + ((med2 - med1) * ratio)
  rescue
    iterate_midway(ratio, &block)
  end
end

#min(&block) ⇒ Object

Min of any number of items



61
62
63
64
65
66
# File 'lib/just_enumerable_stats.rb', line 61

def min(&block)
  self.inject do |best, e|
    val = block_sorter(best, e, &block)
    best = val < 0 ? best : e
  end
end

#min_index(&block) ⇒ Object

Returns the first index of the min value



69
70
71
# File 'lib/just_enumerable_stats.rb', line 69

def min_index(&block)
  self.index(min(&block))
end

#min_of_lists(*enums) ⇒ Object

Returns the min of two or more enumerables. >> [1,2,3].min_of_lists(, [0,2,9])

> [0, 2, 3]



591
592
593
# File 'lib/just_enumerable_stats.rb', line 591

def min_of_lists(*enums)
  yield_transpose(*enums) {|e| e.min}
end

#new_sort(&block) ⇒ Object

I don’t pass the block to the sort, because a sort block needs to look something like: {|x,y| x <=> y}. To get around this, set the default block on the object.



324
325
326
327
328
329
330
331
332
# File 'lib/just_enumerable_stats.rb', line 324

def new_sort(&block)
  if block_given?
    map { |i| yield(i) }.sort.dup
  elsif default_block
    map { |i| default_block[*i] }.sort.dup
  else
    sort().dup
  end
end

#order(&block) ⇒ Object

Given values like [10,5,5,1] Rank should produce something like [4,2,2,1] And order should produce something like [4,2,3,1] The trick is that rank skips as many as were duplicated, so there could not be a 3 in the rank from the example above.



354
355
356
357
358
359
360
361
362
363
# File 'lib/just_enumerable_stats.rb', line 354

def order(&block)
  hold = []
  rank(&block).each do |x|
    while hold.include?(x) do
      x += 1
    end
    hold << x
  end
  hold
end

#original_maxObject



32
# File 'lib/just_enumerable_stats.rb', line 32

alias :original_max :max

#original_minObject



33
# File 'lib/just_enumerable_stats.rb', line 33

alias :original_min :min

#productObject

Multiplies the values: >> product(1,2,3)

> 6.0



459
460
461
# File 'lib/just_enumerable_stats.rb', line 459

def product
  self.inject(one) {|sum, a| sum *= a}
end

#quantile(&block) ⇒ Object

First quartile: nth_split_by_m(1, 4) Third quartile: nth_split_by_m(3, 4) Median: nth_split_by_m(1, 2) Doesn’t match R, and it’s silly to try to. def nth_split_by_m(n, m)

sorted  = new_sort
dividers = m - 1
if size % m == dividers # Divides evenly
  # Because we have a 0-based list, we get the floor
  i = ((size / m.to_f) * n).floor
  j = i
else
  # This reflects R's approach, which I don't think I agree with.
  i = (((size / m.to_f) * n) - 1)
  i = i > (size / m.to_f) ? i.floor : i.ceil
  j = i + 1
end
sorted[i] + ((n / m.to_f) * (sorted[j] - sorted[i]))

end



384
385
386
387
388
389
390
391
392
# File 'lib/just_enumerable_stats.rb', line 384

def quantile(&block)
  [
    min(&block), 
    first_half(&block).median(0.25, &block), 
    median(&block), 
    second_half(&block).median(0.75, &block), 
    max(&block)
  ]
end

#rand_in_range(*args) ⇒ Object

Returns a random integer in the range for any number of lists. This is a way to get a random vector that is tenable based on the sample data. For example, given two sets of numbers:

a = [1,2,3]; b = [8,8,8]

rand_in_pair_range will return a value >= 1 and <= 8 in the first place, >= 2 and <= 8 in the second place, and >= 3 and <= 8 in the last place. Works for integers. Rethink this for floats. May consider setting up FixedRange for floats. O(n*5)



545
546
547
548
549
550
551
# File 'lib/just_enumerable_stats.rb', line 545

def rand_in_range(*args)
  min = self.min_of_lists(*args)
  max = self.max_of_lists(*args)
  (0...size).inject([]) do |ary, i|
    ary << rand_between(min[i], max[i])
  end
end

#range(&block) ⇒ Object

Just an array of [min, max] to comply with R uses of the work. Use range_as_range if you want a real Range.



239
240
241
# File 'lib/just_enumerable_stats.rb', line 239

def range(&block)
  [min(&block), max(&block)]
end

#range_as_range(&block) ⇒ Object Also known as: range_instance

Actually instantiates the range, instead of producing a min and max array.



312
313
314
315
316
317
318
# File 'lib/just_enumerable_stats.rb', line 312

def range_as_range(&block)
  if @range_class_args and not @range_class_args.empty?
    self.range_class.new(*@range_class_args)
  else
    self.range_class.new(min(&block), max(&block))
  end
end

#range_classObject

When creating a range, what class will it be? Defaults to Range, but other classes are sometimes useful.



307
308
309
# File 'lib/just_enumerable_stats.rb', line 307

def range_class
  @range_class ||= Range
end

#rank(&block) ⇒ Object

Doesn’t overwrite things like Matrix#rank



335
336
337
338
339
340
341
342
343
344
345
346
347
# File 'lib/just_enumerable_stats.rb', line 335

def rank(&block)

  sorted = new_sort(&block)

  if block_given?
    map { |i| sorted.index(yield(i)) + 1 }
  elsif default_block
    map { |i| sorted.index(default_block[*i]) + 1 }
  else
    map { |i| sorted.index(i) + 1 }
  end

end

#set_range(hash) ⇒ Object

Takes a hash of arrays for categories If Facets happens to be loaded on the computer, this keeps the order of the categories straight.



253
254
255
256
257
258
259
260
261
262
263
# File 'lib/just_enumerable_stats.rb', line 253

def set_range(hash)
  if defined?(Dictionary)
    @range_hash = Dictionary.new
    @range_hash.merge!(hash)
    @categories = @range_hash.keys
  else
    @categories = hash.keys
    @range_hash = hash
  end
  @categories
end

#set_range_class(klass, *args) ⇒ Object

Useful for setting a real range class (FixedRange).



244
245
246
247
248
# File 'lib/just_enumerable_stats.rb', line 244

def set_range_class(klass, *args)
  @range_class = klass
  @range_class_args = args
  self.range_class
end

#sigma_pairs(other, z = zero, &block) ⇒ Object

Sigma of pairs. Returns a single float, or whatever object is sent in. Example: [1,2,3].sigma_pairs(, 0) {|x, y| x + y} returns 21 instead of 21.0.



525
526
527
# File 'lib/just_enumerable_stats.rb', line 525

def sigma_pairs(other, z=zero, &block)
  self.to_pairs(other,&block).inject(z) {|sum, i| sum += i}
end

#standard_deviation(&block) ⇒ Object Also known as: std

The standard deviation. Uses a block or default block.



134
135
136
# File 'lib/just_enumerable_stats.rb', line 134

def standard_deviation(&block)
  Math::sqrt(variance(&block))
end

#sumObject

Adds up the list. Uses a block or default block if present.



100
101
102
103
104
105
106
107
108
109
110
# File 'lib/just_enumerable_stats.rb', line 100

def sum
  sum = zero
  if block_given?
    each{|i| sum += yield(i)}
  elsif default_block
    each{|i| sum += default_block[*i]}
  else
    each{|i| sum += i}
  end
  sum
end

#tanimoto_pairs(other) ⇒ Object Also known as: tanimoto_correlation

Finds the tanimoto coefficient: the intersection set size / union set size. This is used to find the distance between two vectors. >> [1,2,3].cor()

> 0.981980506061966

>> [1,2,3].tanimoto_pairs()

> 0.5



476
477
478
# File 'lib/just_enumerable_stats.rb', line 476

def tanimoto_pairs(other)
  intersect(other).size / union(other).size.to_f
end

#to_pairs(other, &block) ⇒ Object

There are going to be a lot more of these kinds of things, so pay attention.



465
466
467
468
# File 'lib/just_enumerable_stats.rb', line 465

def to_pairs(other, &block)
  n = [self.size, other.size].min
  (0...n).map {|i| block.call(self[i], other[i]) }
end

#union(other) ⇒ Object

All of the left and right hand sides, excluding duplicates. “The union of x and y”



487
488
489
# File 'lib/just_enumerable_stats.rb', line 487

def union(other)
  self | other
end

#variance(&block) ⇒ Object Also known as: var

The variance, uses a block or default block.



120
121
122
123
124
125
126
127
128
129
130
# File 'lib/just_enumerable_stats.rb', line 120

def variance(&block)
  m = mean(&block)
  sum_of_differences = if block_given?
    sum{ |i| j=yield(i); (m - j) ** 2 }
  elsif default_block
    sum{ |i| j=default_block[*i]; (m - j) ** 2 }
  else
    sum{ |i| (m - i) ** 2 }
  end
  sum_of_differences / (size - 1)
end

#yield_transpose(*enums, &block) ⇒ Object

Transposes arrays of arrays and yields a block on the value. The regular Array#transpose ignores blocks



574
575
576
577
578
579
# File 'lib/just_enumerable_stats.rb', line 574

def yield_transpose(*enums, &block)
  enums.unshift(self)
  n = enums.map{ |x| x.size}.min
  block ||= lambda{|e| e}
  (0...n).map { |i| block.call enums.map{ |x| x[i] } }
end