Class: ClusterKit::Dimensionality::UMAP
- Inherits:
-
Object
- Object
- ClusterKit::Dimensionality::UMAP
- Defined in:
- lib/clusterkit/dimensionality/umap.rb
Instance Attribute Summary collapse
-
#n_components ⇒ Object
readonly
Returns the value of attribute n_components.
-
#n_neighbors ⇒ Object
readonly
Returns the value of attribute n_neighbors.
-
#nb_grad_batch ⇒ Object
readonly
Returns the value of attribute nb_grad_batch.
-
#nb_sampling_by_edge ⇒ Object
readonly
Returns the value of attribute nb_sampling_by_edge.
-
#random_seed ⇒ Object
readonly
Returns the value of attribute random_seed.
Class Method Summary collapse
-
.load_data(path) ⇒ Array<Array<Float>>
Load transformed data from JSON file.
-
.load_model(path) ⇒ UMAP
Load a fitted model from a file.
-
.save_data(data, path) ⇒ Object
Save transformed data to JSON file.
Instance Method Summary collapse
-
#fit(data) ⇒ self
Fit the model to the data (training).
-
#fit_transform(data) ⇒ Array<Array<Float>>
Fit the model and transform the data in one step.
-
#fitted? ⇒ Boolean
Check if the model has been fitted.
-
#initialize(n_components: 2, n_neighbors: 15, random_seed: nil, nb_grad_batch: 10, nb_sampling_by_edge: 8) ⇒ UMAP
constructor
Initialize a new UMAP instance.
-
#save_model(path) ⇒ Object
Save the fitted model to a file.
-
#transform(data) ⇒ Array<Array<Float>>
Transform data using the fitted model.
Constructor Details
#initialize(n_components: 2, n_neighbors: 15, random_seed: nil, nb_grad_batch: 10, nb_sampling_by_edge: 8) ⇒ UMAP
Initialize a new UMAP instance
22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 22 def initialize(n_components: 2, n_neighbors: 15, random_seed: nil, nb_grad_batch: 10, nb_sampling_by_edge: 8) @n_components = n_components @n_neighbors = n_neighbors @random_seed = random_seed @nb_grad_batch = nb_grad_batch @nb_sampling_by_edge = nb_sampling_by_edge @fitted = false # Don't create RustUMAP yet - will be created in fit/fit_transform with adjusted parameters @rust_umap = nil end |
Instance Attribute Details
#n_components ⇒ Object (readonly)
Returns the value of attribute n_components.
12 13 14 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 12 def n_components @n_components end |
#n_neighbors ⇒ Object (readonly)
Returns the value of attribute n_neighbors.
12 13 14 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 12 def n_neighbors @n_neighbors end |
#nb_grad_batch ⇒ Object (readonly)
Returns the value of attribute nb_grad_batch.
12 13 14 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 12 def nb_grad_batch @nb_grad_batch end |
#nb_sampling_by_edge ⇒ Object (readonly)
Returns the value of attribute nb_sampling_by_edge.
12 13 14 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 12 def nb_sampling_by_edge @nb_sampling_by_edge end |
#random_seed ⇒ Object (readonly)
Returns the value of attribute random_seed.
12 13 14 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 12 def random_seed @random_seed end |
Class Method Details
.load_data(path) ⇒ Array<Array<Float>>
Load transformed data from JSON file
153 154 155 156 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 153 def self.load_data(path) raise ArgumentError, "File not found: #{path}" unless File.exist?(path) JSON.parse(File.read(path)) end |
.load_model(path) ⇒ UMAP
Load a fitted model from a file
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 123 def self.load_model(path) raise ArgumentError, "File not found: #{path}" unless File.exist?(path) # Load the Rust model (access private constant) rust_umap = ::ClusterKit.const_get(:RustUMAP).load_model(path) # Create a new UMAP instance with the loaded model instance = allocate instance.instance_variable_set(:@rust_umap, rust_umap) instance.instance_variable_set(:@fitted, true) # The model file should contain these parameters, but for now we don't have access instance.instance_variable_set(:@n_components, nil) instance.instance_variable_set(:@n_neighbors, nil) instance.instance_variable_set(:@random_seed, nil) instance end |
.save_data(data, path) ⇒ Object
Save transformed data to JSON file
144 145 146 147 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 144 def self.save_data(data, path) FileUtils.mkdir_p(File.dirname(path)) unless File.dirname(path) == '.' File.write(path, JSON.pretty_generate(data)) end |
Instance Method Details
#fit(data) ⇒ self
UMAP’s training process inherently produces embeddings. Since the underlying Rust implementation doesn’t separate training from transformation, we call fit_transform but discard the embeddings. Use fit_transform if you need both training and the transformed data.
Fit the model to the data (training)
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 41 def fit(data) validate_input(data) # Always recreate RustUMAP for fit to ensure fresh fit @rust_umap = nil create_rust_umap_with_adjusted_params(data) # UMAP doesn't separate training from transformation internally, # so we call fit_transform but discard the result begin Silence.maybe_silence do @rust_umap.fit_transform(data) end @fitted = true self rescue StandardError => e handle_umap_error(e, data) rescue => e # Handle fatal errors that aren't StandardError handle_umap_error(RuntimeError.new(e.), data) end end |
#fit_transform(data) ⇒ Array<Array<Float>>
Fit the model and transform the data in one step
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 79 def fit_transform(data) validate_input(data) # Always recreate RustUMAP for fit_transform to ensure fresh fit @rust_umap = nil create_rust_umap_with_adjusted_params(data) begin result = Silence.maybe_silence do @rust_umap.fit_transform(data) end @fitted = true result rescue StandardError => e handle_umap_error(e, data) rescue => e # Handle fatal errors that aren't StandardError handle_umap_error(RuntimeError.new(e.), data) end end |
#fitted? ⇒ Boolean
Check if the model has been fitted
102 103 104 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 102 def fitted? @fitted end |
#save_model(path) ⇒ Object
Save the fitted model to a file
109 110 111 112 113 114 115 116 117 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 109 def save_model(path) raise RuntimeError, "No model to save. Call fit or fit_transform first." unless fitted? # Ensure directory exists dir = File.dirname(path) FileUtils.mkdir_p(dir) unless dir == '.' || dir == '/' @rust_umap.save_model(path) end |
#transform(data) ⇒ Array<Array<Float>>
Transform data using the fitted model
68 69 70 71 72 73 74 |
# File 'lib/clusterkit/dimensionality/umap.rb', line 68 def transform(data) raise RuntimeError, "Model must be fitted before transform. Call fit or fit_transform first." unless fitted? validate_input(data, check_min_samples: false) Silence.maybe_silence do @rust_umap.transform(data) end end |