Module: ClusterKit::Dimensionality

Defined in:
lib/clusterkit/dimensionality.rb,
lib/clusterkit/dimensionality/pca.rb,
lib/clusterkit/dimensionality/svd.rb,
lib/clusterkit/dimensionality/umap.rb

Overview

Module for dimensionality reduction algorithms

Defined Under Namespace

Classes: PCA, SVD, UMAP

Class Method Summary collapse

Class Method Details

.pca(data, n_components: 2) ⇒ Array

Module-level convenience method

Parameters:

  • data (Array)

    2D array of data points

  • n_components (Integer) (defaults to: 2)

    Number of components

Returns:

  • (Array)

    Transformed data



246
247
248
249
# File 'lib/clusterkit/dimensionality/pca.rb', line 246

def self.pca(data, n_components: 2)
pca = PCA.new(n_components: n_components)
pca.fit_transform(data)
end

.reconstruction_error(original_data, reconstructed_data) ⇒ Float

Calculate reconstruction error for a dimensionality reduction

Parameters:

  • original_data (Array<Array<Numeric>>)

    Original high-dimensional data

  • reconstructed_data (Array<Array<Numeric>>)

    Reconstructed data

Returns:

  • (Float)

    Mean squared reconstruction error

Raises:

  • (ArgumentError)


17
18
19
20
21
22
23
24
25
26
27
# File 'lib/clusterkit/dimensionality.rb', line 17

def self.reconstruction_error(original_data, reconstructed_data)
  raise ArgumentError, "Data sizes don't match" if original_data.size != reconstructed_data.size
  
  total_error = 0.0
  original_data.zip(reconstructed_data).each do |orig, recon|
    error = orig.zip(recon).map { |o, r| (o - r) ** 2 }.sum
    total_error += error
  end
  
  total_error / original_data.size
end