Class: Ai4r::Clusterers::KMeans
- Defined in:
- lib/ai4r/clusterers/k_means.rb
Overview
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, with k < n.
More about K Means algorithm: en.wikipedia.org/wiki/K-means_algorithm
Direct Known Subclasses
Instance Attribute Summary collapse
-
#centroids ⇒ Object
readonly
Returns the value of attribute centroids.
-
#clusters ⇒ Object
readonly
Returns the value of attribute clusters.
-
#data_set ⇒ Object
readonly
Returns the value of attribute data_set.
-
#iterations ⇒ Object
readonly
Returns the value of attribute iterations.
-
#number_of_clusters ⇒ Object
readonly
Returns the value of attribute number_of_clusters.
Instance Method Summary collapse
-
#build(data_set, number_of_clusters) ⇒ Object
Build a new clusterer, using data examples found in data_set.
-
#distance(a, b) ⇒ Object
This function calculates the distance between 2 different instances.
-
#eval(data_item) ⇒ Object
Classifies the given data item, returning the cluster index it belongs to (0-based).
-
#initialize ⇒ KMeans
constructor
A new instance of KMeans.
Methods included from Data::Parameterizable
#get_parameters, included, #set_parameters
Constructor Details
#initialize ⇒ KMeans
Returns a new instance of KMeans.
48 49 50 51 52 53 54 55 56 |
# File 'lib/ai4r/clusterers/k_means.rb', line 48 def initialize @distance_function = nil @max_iterations = nil @centroid_function = lambda do |data_sets| data_sets.collect{ |data_set| data_set.get_mean_or_mode} end @centroid_indices = [] @on_empty = 'eliminate' # default if none specified end |
Instance Attribute Details
#centroids ⇒ Object (readonly)
Returns the value of attribute centroids.
25 26 27 |
# File 'lib/ai4r/clusterers/k_means.rb', line 25 def centroids @centroids end |
#clusters ⇒ Object (readonly)
Returns the value of attribute clusters.
25 26 27 |
# File 'lib/ai4r/clusterers/k_means.rb', line 25 def clusters @clusters end |
#data_set ⇒ Object (readonly)
Returns the value of attribute data_set.
24 25 26 |
# File 'lib/ai4r/clusterers/k_means.rb', line 24 def data_set @data_set end |
#iterations ⇒ Object (readonly)
Returns the value of attribute iterations.
25 26 27 |
# File 'lib/ai4r/clusterers/k_means.rb', line 25 def iterations @iterations end |
#number_of_clusters ⇒ Object (readonly)
Returns the value of attribute number_of_clusters.
24 25 26 |
# File 'lib/ai4r/clusterers/k_means.rb', line 24 def number_of_clusters @number_of_clusters end |
Instance Method Details
#build(data_set, number_of_clusters) ⇒ Object
Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/ai4r/clusterers/k_means.rb', line 62 def build(data_set, number_of_clusters) @data_set = data_set @number_of_clusters = number_of_clusters raise ArgumentError, 'Length of centroid indices array differs from the specified number of clusters' unless @centroid_indices.empty? || @centroid_indices.length == @number_of_clusters raise ArgumentError, 'Invalid value for on_empty' unless @on_empty == 'eliminate' || @on_empty == 'terminate' || @on_empty == 'random' || @on_empty == 'outlier' @iterations = 0 calc_initial_centroids while(not stop_criteria_met) calculate_membership_clusters recompute_centroids end return self end |
#distance(a, b) ⇒ Object
This function calculates the distance between 2 different instances. By default, it returns the euclidean distance to the power of 2. You can provide a more convenient distance implementation:
1- Overwriting this method
2- Providing a closure to the :distance_function parameter
93 94 95 96 97 98 |
# File 'lib/ai4r/clusterers/k_means.rb', line 93 def distance(a, b) return @distance_function.call(a, b) if @distance_function return Ai4r::Data::Proximity.squared_euclidean_distance( a.select {|att_a| att_a.is_a? Numeric} , b.select {|att_b| att_b.is_a? Numeric}) end |
#eval(data_item) ⇒ Object
Classifies the given data item, returning the cluster index it belongs to (0-based).
80 81 82 83 |
# File 'lib/ai4r/clusterers/k_means.rb', line 80 def eval(data_item) get_min_index(@centroids.collect {|centroid| distance(data_item, centroid)}) end |