Module: ClusterKit::Utils

Defined in:
lib/clusterkit/utils.rb

Overview

Utility functions for data analysis

Class Method Summary collapse

Class Method Details

.estimate_hubness(data) ⇒ Hash

Estimate hubness in the data

Parameters:

  • data (Array, Numo::NArray)

    Input data

Returns:

  • (Hash)

    Hubness statistics

Raises:

  • (ArgumentError)


22
23
24
25
26
27
# File 'lib/clusterkit/utils.rb', line 22

def estimate_hubness(data)
  raise ArgumentError, "Unsupported data type: #{data.class}" unless data.is_a?(Array)
  
  result = estimate_hubness_rust(data)
  symbolize_keys(result)
end

.estimate_intrinsic_dimension(data, k_neighbors: 10) ⇒ Float

Estimate the intrinsic dimension of data

Parameters:

  • data (Array, Numo::NArray)

    Input data

  • k_neighbors (Integer) (defaults to: 10)

    Number of neighbors to consider

Returns:

  • (Float)

    Estimated intrinsic dimension

Raises:

  • (ArgumentError)


13
14
15
16
17
# File 'lib/clusterkit/utils.rb', line 13

def estimate_intrinsic_dimension(data, k_neighbors: 10)
  raise ArgumentError, "Unsupported data type: #{data.class}" unless data.is_a?(Array)
  
  estimate_intrinsic_dimension_rust(data, k_neighbors)
end

.neighborhood_stability(original_data, embedded_data, k: 15) ⇒ Float

Measure neighborhood stability through embedding

Parameters:

  • original_data (Array, Numo::NArray)

    Original high-dimensional data

  • embedded_data (Array, Numo::NArray)

    Embedded low-dimensional data

  • k (Integer) (defaults to: 15)

    Number of neighbors to check

Returns:

  • (Float)

    Stability score (0-1, higher is better)

Raises:

  • (ArgumentError)


34
35
36
37
38
39
40
# File 'lib/clusterkit/utils.rb', line 34

def neighborhood_stability(original_data, embedded_data, k: 15)
  raise ArgumentError, "Unsupported data type: #{original_data.class}" unless original_data.is_a?(Array)
  raise ArgumentError, "Unsupported data type: #{embedded_data.class}" unless embedded_data.is_a?(Array)
  
  # TODO: Implement neighborhood stability calculation
  raise NotImplementedError, "Neighborhood stability not implemented yet"
end