Module: ClusterKit::Preprocessing

Defined in:
lib/clusterkit/preprocessing.rb

Overview

Data preprocessing utilities

Class Method Summary collapse

Class Method Details

.normalize(data, method: :standard) ⇒ Array

Normalize data using specified method

Parameters:

  • data (Array)

    Input data (2D array)

  • method (Symbol) (defaults to: :standard)

    Normalization method (:standard, :minmax, :l2)

Returns:

  • (Array)

    Normalized data

Raises:

  • (ArgumentError)


13
14
15
16
17
18
19
20
21
22
23
24
25
26
# File 'lib/clusterkit/preprocessing.rb', line 13

def normalize(data, method: :standard)
  raise ArgumentError, "Unsupported data type: #{data.class}" unless data.is_a?(Array)
  
  case method
  when :standard
    standard_normalize(data)
  when :minmax
    minmax_normalize(data)
  when :l2
    l2_normalize(data)
  else
    raise ArgumentError, "Unknown normalization method: #{method}"
  end
end

.pca_reduce(data, n_components) ⇒ Array

Reduce dimensionality using PCA before embedding

Parameters:

  • data (Array)

    Input data

  • n_components (Integer)

    Number of PCA components

Returns:

  • (Array)

    Reduced data

Raises:

  • (NotImplementedError)


32
33
34
35
36
# File 'lib/clusterkit/preprocessing.rb', line 32

def pca_reduce(data, n_components)
  # Note: This would require SVD implementation in pure Ruby
  # For now, raise an error suggesting to use the Rust-based SVD module
  raise NotImplementedError, "PCA reduction requires the SVD module which needs to be called directly"
end