Module: Ai4r::Data::Statistics

Defined in:
lib/ai4r/data/statistics.rb

Overview

This module provides some basic statistics functions to operate on data set attributes.

Class Method Summary collapse

Class Method Details

.max(data_set, attribute) ⇒ Object

Get the maximum value of an attribute in the data set



62
63
64
65
66
# File 'lib/ai4r/data/statistics.rb', line 62

def self.max(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.max {|x,y| x[index] <=> y[index]}
  return (item) ? item[index] : (-1.0/0)
end

.mean(data_set, attribute) ⇒ Object

Get the sample mean



20
21
22
23
24
25
# File 'lib/ai4r/data/statistics.rb', line 20

def self.mean(data_set, attribute)
  index = data_set.get_index(attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += item[index] }
  return sum / data_set.data_items.length
end

.min(data_set, attribute) ⇒ Object

Get the minimum value of an attribute in the data set



69
70
71
72
73
# File 'lib/ai4r/data/statistics.rb', line 69

def self.min(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.min {|x,y| x[index] <=> y[index]}
  return (item) ? item[index] : (1.0/0)
end

.mode(data_set, attribute) ⇒ Object

Get the sample mode.



45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/ai4r/data/statistics.rb', line 45

def self.mode(data_set, attribute)
  index = data_set.get_index(attribute)
  count = Hash.new {0}
  max_count = 0
  mode = nil
  data_set.data_items.each do |data_item| 
    attr_value = data_item[index]
    attr_count = (count[attr_value] += 1)
    if attr_count > max_count
      mode = attr_value
      max_count = attr_count
    end
  end
  return mode
end

.standard_deviation(data_set, attribute, variance = nil) ⇒ Object

Get the standard deviation. You can provide the variance if you have it already, to speed up things.



39
40
41
42
# File 'lib/ai4r/data/statistics.rb', line 39

def self.standard_deviation(data_set, attribute, variance = nil)
  variance ||= variance(data_set, attribute)
  Math.sqrt(variance)
end

.variance(data_set, attribute, mean = nil) ⇒ Object

Get the variance. You can provide the mean if you have it already, to speed up things.



29
30
31
32
33
34
35
# File 'lib/ai4r/data/statistics.rb', line 29

def self.variance(data_set, attribute, mean = nil)
  index = data_set.get_index(attribute)
  mean = mean(data_set, attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += (item[index]-mean)**2 }
  return sum / (data_set.data_items.length-1)
end