Module: ClusterKit
- Defined in:
- lib/clusterkit.rb,
lib/clusterkit/hnsw.rb,
lib/clusterkit/utils.rb,
lib/clusterkit/silence.rb,
lib/clusterkit/version.rb,
lib/clusterkit/clustering.rb,
lib/clusterkit/configuration.rb,
lib/clusterkit/preprocessing.rb,
lib/clusterkit/data_validator.rb,
lib/clusterkit/dimensionality.rb,
lib/clusterkit/clustering/hdbscan.rb,
lib/clusterkit/dimensionality/pca.rb,
lib/clusterkit/dimensionality/svd.rb,
lib/clusterkit/hdbscan_api_design.rb,
lib/clusterkit/dimensionality/umap.rb
Overview
API Design for HDBSCAN to match KMeans pattern
Defined Under Namespace
Modules: Clustering, DataValidator, Dimensionality, Preprocessing, Silence, Utils Classes: Configuration, ConvergenceError, DataError, DimensionError, DisconnectedGraphError, Error, HNSW, InsufficientDataError, InvalidParameterError, IsolatedPointError
Constant Summary collapse
- VERSION =
"0.2.6"
Class Attribute Summary collapse
-
.configuration ⇒ Object
Returns the value of attribute configuration.
Class Method Summary collapse
- .configure {|configuration| ... } ⇒ Object
-
.estimate_dimension(data, k: 10) ⇒ Float
Estimate intrinsic dimension of data.
-
.kmeans(data, k: nil, k_range: 2..10, **options) ⇒ Array
Quick K-means with automatic k detection.
-
.pca(data, n_components: 2) ⇒ Array
Quick PCA.
-
.svd(matrix, k, n_iter: 2) ⇒ Array
Perform SVD.
-
.tsne(data, n_components: 2, **options) ⇒ Object
deprecated
Deprecated.
Not implemented - use UMAP instead
-
.umap(data, n_components: 2, **options) ⇒ Array
Quick UMAP embedding.
Class Attribute Details
.configuration ⇒ Object
Returns the value of attribute configuration.
5 6 7 |
# File 'lib/clusterkit/configuration.rb', line 5 def configuration @configuration end |
Class Method Details
.configure {|configuration| ... } ⇒ Object
8 9 10 11 |
# File 'lib/clusterkit/configuration.rb', line 8 def self.configure self.configuration ||= Configuration.new yield(configuration) if block_given? end |
.estimate_dimension(data, k: 10) ⇒ Float
Estimate intrinsic dimension of data
67 68 69 |
# File 'lib/clusterkit.rb', line 67 def estimate_dimension(data, k: 10) Utils.estimate_intrinsic_dimension(data, k_neighbors: k) end |
.kmeans(data, k: nil, k_range: 2..10, **options) ⇒ Array
Quick K-means with automatic k detection
86 87 88 89 90 |
# File 'lib/clusterkit.rb', line 86 def kmeans(data, k: nil, k_range: 2..10, **) k ||= Clustering::KMeans.optimal_k(data, k_range: k_range) kmeans = Clustering::KMeans.new(k: k, **) kmeans.fit_predict(data) end |
.pca(data, n_components: 2) ⇒ Array
Quick PCA
52 53 54 55 |
# File 'lib/clusterkit.rb', line 52 def pca(data, n_components: 2) pca = Dimensionality::PCA.new(n_components: n_components) pca.fit_transform(data) end |
.svd(matrix, k, n_iter: 2) ⇒ Array
Perform SVD
76 77 78 79 |
# File 'lib/clusterkit.rb', line 76 def svd(matrix, k, n_iter: 2) svd = Dimensionality::SVD.new(n_components: k, n_iter: n_iter) svd.fit_transform(matrix) end |
.tsne(data, n_components: 2, **options) ⇒ Object
Not implemented - use UMAP instead
t-SNE is not yet implemented
59 60 61 |
# File 'lib/clusterkit.rb', line 59 def tsne(data, n_components: 2, **) raise NotImplementedError, "t-SNE is not yet implemented. Please use UMAP instead, which provides similar dimensionality reduction capabilities." end |
.umap(data, n_components: 2, **options) ⇒ Array
Quick UMAP embedding
43 44 45 46 |
# File 'lib/clusterkit.rb', line 43 def umap(data, n_components: 2, **) umap = Dimensionality::UMAP.new(n_components: n_components, **) umap.fit_transform(data) end |