Module: Ai4r::Data::Statistics

Defined in:
lib/ai4r/data/statistics.rb

Overview

This module provides some basic statistics functions to operate on data set attributes.

Class Method Summary collapse

Class Method Details

.max(data_set, attribute) ⇒ Object

Get the maximum value of an attribute in the data set

Parameters:

  • data_set (Object)
  • attribute (Object)

Returns:

  • (Object)


71
72
73
74
75
# File 'lib/ai4r/data/statistics.rb', line 71

def self.max(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.max_by { |item| item[index] }
  item ? item[index] : -Float::INFINITY
end

.mean(data_set, attribute) ⇒ Object

Get the sample mean

Parameters:

  • data_set (Object)
  • attribute (Object)

Returns:

  • (Object)


21
22
23
24
25
26
# File 'lib/ai4r/data/statistics.rb', line 21

def self.mean(data_set, attribute)
  index = data_set.get_index(attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += item[index] }
  sum / data_set.data_items.length
end

.min(data_set, attribute) ⇒ Object

Get the minimum value of an attribute in the data set

Parameters:

  • data_set (Object)
  • attribute (Object)

Returns:

  • (Object)


81
82
83
84
85
# File 'lib/ai4r/data/statistics.rb', line 81

def self.min(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.min_by { |item| item[index] }
  item ? item[index] : Float::INFINITY
end

.mode(data_set, attribute) ⇒ Object

Get the sample mode.

Parameters:

  • data_set (Object)
  • attribute (Object)

Returns:

  • (Object)


57
58
59
60
61
62
63
64
65
# File 'lib/ai4r/data/statistics.rb', line 57

def self.mode(data_set, attribute)
  index = data_set.get_index(attribute)
  data_set
    .data_items
    .map { |item| item[index] }
    .tally
    .max_by { _2 }
    &.first
end

.standard_deviation(data_set, attribute, variance = nil) ⇒ Object

Get the standard deviation. You can provide the variance if you have it already, to speed up things.

Parameters:

  • data_set (Object)
  • attribute (Object)
  • variance (Object) (defaults to: nil)

Returns:

  • (Object)


48
49
50
51
# File 'lib/ai4r/data/statistics.rb', line 48

def self.standard_deviation(data_set, attribute, variance = nil)
  variance ||= variance(data_set, attribute)
  Math.sqrt(variance)
end

.variance(data_set, attribute, mean = nil) ⇒ Object

Get the variance. You can provide the mean if you have it already, to speed up things.

Parameters:

  • data_set (Object)
  • attribute (Object)
  • mean (Object) (defaults to: nil)

Returns:

  • (Object)


34
35
36
37
38
39
40
# File 'lib/ai4r/data/statistics.rb', line 34

def self.variance(data_set, attribute, mean = nil)
  index = data_set.get_index(attribute)
  mean ||= mean(data_set, attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += (item[index] - mean)**2 }
  sum / (data_set.data_items.length - 1)
end