Class: Ai4r::Classifiers::IB1
- Inherits:
-
Classifier
- Object
- Classifier
- Ai4r::Classifiers::IB1
- Defined in:
- lib/ai4r/classifiers/ib1.rb
Overview
Introduction
IB1 algorithm implementation. IB1 is the simplest instance-based learning (IBL) algorithm.
-
Aha, D. Kibler (1991). Instance-based learning algorithms.
Machine Learning. 6:37-66.
IBI is identical to the nearest neighbor algorithm except that it normalizes its attributes’ ranges, processes instances incrementally, and has a simple policy for tolerating missing values
Instance Attribute Summary collapse
-
#data_set ⇒ Object
readonly
Returns the value of attribute data_set.
-
#max_values ⇒ Object
readonly
Returns the value of attribute max_values.
-
#min_values ⇒ Object
readonly
Returns the value of attribute min_values.
Instance Method Summary collapse
-
#add_instance(data_item) ⇒ Object
Append a new instance to the internal dataset.
-
#build(data_set) ⇒ Object
Build a new IB1 classifier.
-
#eval(data) ⇒ Object
You can evaluate new data, predicting its class.
- #initialize ⇒ Object constructor
-
#neighbors_for(data, k_neighbors) ⇒ Object
Returns an array with the
knearest instances from the training set for the givendataitem. -
#update_with_instance(data_item, learn: false) ⇒ Object
Update min/max values with the provided instance attributes.
Methods inherited from Classifier
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Constructor Details
#initialize ⇒ Object
42 43 44 45 46 47 48 49 |
# File 'lib/ai4r/classifiers/ib1.rb', line 42 def initialize super() @k = 1 @distance_function = nil @tie_break = :first @random_seed = nil @rng = nil end |
Instance Attribute Details
#data_set ⇒ Object (readonly)
Returns the value of attribute data_set.
30 31 32 |
# File 'lib/ai4r/classifiers/ib1.rb', line 30 def data_set @data_set end |
#max_values ⇒ Object (readonly)
Returns the value of attribute max_values.
30 31 32 |
# File 'lib/ai4r/classifiers/ib1.rb', line 30 def max_values @max_values end |
#min_values ⇒ Object (readonly)
Returns the value of attribute min_values.
30 31 32 |
# File 'lib/ai4r/classifiers/ib1.rb', line 30 def min_values @min_values end |
Instance Method Details
#add_instance(data_item) ⇒ Object
Append a new instance to the internal dataset. The last element is considered the class label. Minimum and maximum values for numeric attributes are updated so that future distance calculations remain normalized.
71 72 73 74 75 |
# File 'lib/ai4r/classifiers/ib1.rb', line 71 def add_instance(data_item) @data_set << data_item update_min_max(data_item[0...-1]) self end |
#build(data_set) ⇒ Object
Build a new IB1 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.
56 57 58 59 60 61 62 63 |
# File 'lib/ai4r/classifiers/ib1.rb', line 56 def build(data_set) data_set.check_not_empty @data_set = data_set @min_values = Array.new(data_set.data_labels.length) @max_values = Array.new(data_set.data_labels.length) data_set.data_items.each { |data_item| update_min_max(data_item[0...-1]) } self end |
#eval(data) ⇒ Object
You can evaluate new data, predicting its class. e.g.
classifier.eval(['New York', '<30', 'F']) # => 'Y'
Evaluation does not update internal statistics, keeping the classifier state unchanged. Use update_with_instance to incorporate new samples.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/ai4r/classifiers/ib1.rb', line 84 def eval(data) neighbors = @data_set.data_items.map do |train_item| [distance(data, train_item), train_item.last] end neighbors.sort_by! { |d, _| d } k_limit = [@k, @data_set.data_items.length].min k_neighbors = neighbors.first(k_limit) # Include any other neighbors tied with the last selected distance last_distance = k_neighbors.last[0] neighbors[k_limit..].to_a.each do |dist, klass| break if dist > last_distance k_neighbors << [dist, klass] end counts = Hash.new(0) k_neighbors.each { |(_dist, klass)| counts[klass] += 1 } max_votes = counts.values.max tied = counts.select { |_, v| v == max_votes }.keys return tied.first if tied.length == 1 rng = @rng || (@random_seed.nil? ? Random.new : Random.new(@random_seed)) case @tie_break when :random tied.sample(random: rng) else k_neighbors.each { |(_dist, klass)| return klass if tied.include?(klass) } end end |
#neighbors_for(data, k_neighbors) ⇒ Object
Returns an array with the k nearest instances from the training set for the given data item. The returned elements are the training data rows themselves, ordered from the closest to the furthest.
123 124 125 126 127 128 129 130 |
# File 'lib/ai4r/classifiers/ib1.rb', line 123 def neighbors_for(data, k_neighbors) update_min_max(data) @data_set.data_items .map { |train_item| [train_item, distance(data, train_item)] } .sort_by(&:last) .first(k_neighbors) .map(&:first) end |
#update_with_instance(data_item, learn: false) ⇒ Object
Update min/max values with the provided instance attributes. If learn is true, also append the instance to the training set so the classifier learns incrementally.
135 136 137 138 139 |
# File 'lib/ai4r/classifiers/ib1.rb', line 135 def update_with_instance(data_item, learn: false) update_min_max(data_item[0...-1]) @data_set << data_item if learn self end |