Class: ClusterKit::Clustering::KMeans
- Inherits:
-
Object
- Object
- ClusterKit::Clustering::KMeans
- Defined in:
- lib/clusterkit/clustering.rb
Overview
K-means clustering algorithm
Instance Attribute Summary collapse
-
#centroids ⇒ Object
readonly
Returns the value of attribute centroids.
-
#inertia ⇒ Float
readonly
Get the sum of squared distances of samples to their closest cluster center.
-
#k ⇒ Object
readonly
Returns the value of attribute k.
-
#labels ⇒ Object
readonly
Returns the value of attribute labels.
-
#max_iter ⇒ Object
readonly
Returns the value of attribute max_iter.
Class Method Summary collapse
-
.detect_optimal_k(elbow_results, fallback_k: 3) ⇒ Integer
Detect optimal k from elbow method results.
-
.elbow_method(data, k_range: 2..10, max_iter: 300) ⇒ Hash
Find optimal number of clusters using elbow method.
-
.optimal_k(data, k_range: 2..10, max_iter: 300) ⇒ Integer
Find optimal k and return it.
Instance Method Summary collapse
-
#cluster_centers ⇒ Array
Get cluster centers.
-
#fit(data) ⇒ self
Fit the K-means model.
-
#fit_predict(data) ⇒ Array
Fit the model and return labels.
-
#fitted? ⇒ Boolean
Check if model has been fitted.
-
#initialize(k:, max_iter: 300, random_seed: nil) ⇒ KMeans
constructor
Initialize K-means clusterer.
-
#predict(data) ⇒ Array
Predict cluster labels for new data.
Constructor Details
#initialize(k:, max_iter: 300, random_seed: nil) ⇒ KMeans
Initialize K-means clusterer
18 19 20 21 22 23 24 |
# File 'lib/clusterkit/clustering.rb', line 18 def initialize(k:, max_iter: 300, random_seed: nil) raise ArgumentError, "k must be positive" unless k > 0 @k = k @max_iter = max_iter @random_seed = random_seed @fitted = false end |
Instance Attribute Details
#centroids ⇒ Object (readonly)
Returns the value of attribute centroids.
12 13 14 |
# File 'lib/clusterkit/clustering.rb', line 12 def centroids @centroids end |
#inertia ⇒ Float (readonly)
Get the sum of squared distances of samples to their closest cluster center
71 72 73 |
# File 'lib/clusterkit/clustering.rb', line 71 def inertia @inertia end |
#k ⇒ Object (readonly)
Returns the value of attribute k.
12 13 14 |
# File 'lib/clusterkit/clustering.rb', line 12 def k @k end |
#labels ⇒ Object (readonly)
Returns the value of attribute labels.
12 13 14 |
# File 'lib/clusterkit/clustering.rb', line 12 def labels @labels end |
#max_iter ⇒ Object (readonly)
Returns the value of attribute max_iter.
12 13 14 |
# File 'lib/clusterkit/clustering.rb', line 12 def max_iter @max_iter end |
Class Method Details
.detect_optimal_k(elbow_results, fallback_k: 3) ⇒ Integer
Detect optimal k from elbow method results
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/clusterkit/clustering.rb', line 98 def detect_optimal_k(elbow_results, fallback_k: 3) return fallback_k if elbow_results.nil? || elbow_results.empty? k_values = elbow_results.keys.sort return k_values.first if k_values.size == 1 # Find the k with the largest drop in inertia max_drop = 0 optimal_k = k_values.first k_values.each_cons(2) do |k1, k2| drop = elbow_results[k1] - elbow_results[k2] if drop > max_drop max_drop = drop optimal_k = k2 # Use k after the drop end end optimal_k end |
.elbow_method(data, k_range: 2..10, max_iter: 300) ⇒ Hash
Find optimal number of clusters using elbow method
82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/clusterkit/clustering.rb', line 82 def elbow_method(data, k_range: 2..10, max_iter: 300) results = {} k_range.each do |k| kmeans = new(k: k, max_iter: max_iter) kmeans.fit(data) results[k] = kmeans.inertia end results end |
.optimal_k(data, k_range: 2..10, max_iter: 300) ⇒ Integer
Find optimal k and return it
124 125 126 127 |
# File 'lib/clusterkit/clustering.rb', line 124 def optimal_k(data, k_range: 2..10, max_iter: 300) elbow_results = elbow_method(data, k_range: k_range, max_iter: max_iter) detect_optimal_k(elbow_results) end |
Instance Method Details
#cluster_centers ⇒ Array
Get cluster centers
65 66 67 |
# File 'lib/clusterkit/clustering.rb', line 65 def cluster_centers @centroids end |
#fit(data) ⇒ self
Fit the K-means model
29 30 31 32 33 34 35 36 37 |
# File 'lib/clusterkit/clustering.rb', line 29 def fit(data) validate_data(data) # Call Rust implementation with optional seed @labels, @centroids, @inertia = Clustering.kmeans_rust(data, @k, @max_iter, @random_seed) @fitted = true self end |
#fit_predict(data) ⇒ Array
Fit the model and return labels
52 53 54 55 |
# File 'lib/clusterkit/clustering.rb', line 52 def fit_predict(data) fit(data) @labels end |
#fitted? ⇒ Boolean
Check if model has been fitted
59 60 61 |
# File 'lib/clusterkit/clustering.rb', line 59 def fitted? @fitted end |
#predict(data) ⇒ Array
Predict cluster labels for new data
42 43 44 45 46 47 |
# File 'lib/clusterkit/clustering.rb', line 42 def predict(data) raise RuntimeError, "Model must be fitted before predict" unless fitted? validate_data(data) Clustering.kmeans_predict_rust(data, @centroids) end |