Class: Ai4r::Classifiers::IB1

Inherits:
Classifier show all
Defined in:
lib/ai4r/classifiers/ib1.rb

Overview

Introduction

IB1 algorithm implementation. IB1 is the simplest instance-based learning (IBL) algorithm.

  1. Aha, D. Kibler (1991). Instance-based learning algorithms.

Machine Learning. 6:37-66.

IBI is identical to the nearest neighbor algorithm except that it normalizes its attributes’ ranges, processes instances incrementally, and has a simple policy for tolerating missing values

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods inherited from Classifier

#get_rules

Methods included from Data::Parameterizable

#get_parameters, included, #set_parameters

Constructor Details

#initializeObject



42
43
44
45
46
47
48
49
# File 'lib/ai4r/classifiers/ib1.rb', line 42

def initialize
  super()
  @k = 1
  @distance_function = nil
  @tie_break = :first
  @random_seed = nil
  @rng = nil
end

Instance Attribute Details

#data_setObject (readonly)

Returns the value of attribute data_set.



30
31
32
# File 'lib/ai4r/classifiers/ib1.rb', line 30

def data_set
  @data_set
end

#max_valuesObject (readonly)

Returns the value of attribute max_values.



30
31
32
# File 'lib/ai4r/classifiers/ib1.rb', line 30

def max_values
  @max_values
end

#min_valuesObject (readonly)

Returns the value of attribute min_values.



30
31
32
# File 'lib/ai4r/classifiers/ib1.rb', line 30

def min_values
  @min_values
end

Instance Method Details

#add_instance(data_item) ⇒ Object

Append a new instance to the internal dataset. The last element is considered the class label. Minimum and maximum values for numeric attributes are updated so that future distance calculations remain normalized.

Parameters:

  • data_item (Object)

Returns:

  • (Object)


71
72
73
74
75
# File 'lib/ai4r/classifiers/ib1.rb', line 71

def add_instance(data_item)
  @data_set << data_item
  update_min_max(data_item[0...-1])
  self
end

#build(data_set) ⇒ Object

Build a new IB1 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.

Parameters:

  • data_set (Object)

Returns:

  • (Object)


56
57
58
59
60
61
62
63
# File 'lib/ai4r/classifiers/ib1.rb', line 56

def build(data_set)
  data_set.check_not_empty
  @data_set = data_set
  @min_values = Array.new(data_set.data_labels.length)
  @max_values = Array.new(data_set.data_labels.length)
  data_set.data_items.each { |data_item| update_min_max(data_item[0...-1]) }
  self
end

#eval(data) ⇒ Object

You can evaluate new data, predicting its class. e.g.

classifier.eval(['New York',  '<30', 'F'])  # => 'Y'

Evaluation does not update internal statistics, keeping the classifier state unchanged. Use update_with_instance to incorporate new samples.



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/ai4r/classifiers/ib1.rb', line 84

def eval(data)
  neighbors = @data_set.data_items.map do |train_item|
    [distance(data, train_item), train_item.last]
  end
  neighbors.sort_by! { |d, _| d }
  k_limit = [@k, @data_set.data_items.length].min
  k_neighbors = neighbors.first(k_limit)

  # Include any other neighbors tied with the last selected distance
  last_distance = k_neighbors.last[0]
  neighbors[k_limit..].to_a.each do |dist, klass|
    break if dist > last_distance

    k_neighbors << [dist, klass]
  end

  counts = Hash.new(0)
  k_neighbors.each { |(_dist, klass)| counts[klass] += 1 }
  max_votes = counts.values.max
  tied = counts.select { |_, v| v == max_votes }.keys

  return tied.first if tied.length == 1

  rng = @rng || (@random_seed.nil? ? Random.new : Random.new(@random_seed))

  case @tie_break
  when :random
    tied.sample(random: rng)
  else
    k_neighbors.each { |(_dist, klass)| return klass if tied.include?(klass) }
  end
end

#neighbors_for(data, k_neighbors) ⇒ Object

Returns an array with the k nearest instances from the training set for the given data item. The returned elements are the training data rows themselves, ordered from the closest to the furthest.

Parameters:

  • data (Object)
  • k (Object)

Returns:

  • (Object)


123
124
125
126
127
128
129
130
# File 'lib/ai4r/classifiers/ib1.rb', line 123

def neighbors_for(data, k_neighbors)
  update_min_max(data)
  @data_set.data_items
           .map { |train_item| [train_item, distance(data, train_item)] }
           .sort_by(&:last)
           .first(k_neighbors)
           .map(&:first)
end

#update_with_instance(data_item, learn: false) ⇒ Object

Update min/max values with the provided instance attributes. If learn is true, also append the instance to the training set so the classifier learns incrementally.



135
136
137
138
139
# File 'lib/ai4r/classifiers/ib1.rb', line 135

def update_with_instance(data_item, learn: false)
  update_min_max(data_item[0...-1])
  @data_set << data_item if learn
  self
end