Module: KnnBall

Defined in:
lib/knnball.rb,
lib/knnball/ball.rb,
lib/knnball/stat.rb,
lib/knnball/kdtree.rb,
lib/knnball/result_set.rb

Overview

Copyright © 2011 Olivier Amblet <olivier.amblet.net>

knnball is freely distributable under the terms of an MIT license. See LICENSE or www.opensource.org/licenses/mit-license.php.

Defined Under Namespace

Modules: Stat Classes: Ball, KDTree, ResultSet

Class Method Summary collapse

Class Method Details

.build(data) ⇒ Object

Retrieve a new BallTree given an array of input values.

Each data entry in the array is a Hash containing keys :value and :point, an array of position (one per dimension) [ => 1, :point => [1.23, 2.34, -1.23, -22.3], => 2, :point => [-2.33, 4.2, 1.23, 332.2] ]

Parameters:

  • data

    an array of Hash containing :value and :point key

See Also:



27
28
29
30
31
32
33
34
35
# File 'lib/knnball.rb', line 27

def self.build(data)
  if(data.nil? || data.empty?)
    raise ArgumentError.new("data argument must be a not empty Array")
  end
  max_dimension = data.first[:point].size
  kdtree = KDTree.new(max_dimension)
  kdtree.root = generate(data, max_dimension)
  return kdtree
end

.find_knn(ball_tree, position, k = 1, options = Hash.new) ⇒ Object

Retrieve the k nearest neighbor of the given position.



75
76
77
# File 'lib/knnball.rb', line 75

def self.find_knn(ball_tree, position, k = 1, options = Hash.new)
  return []
end

.generate(data, max_dimension, actual_dimension = 1) ⇒ Object

Generate the KD-Tree hyperrectangle.

Parameters:

  • actual_dimension (defaults to: 1)

    the dimension to base comparison on

  • max_dimension

    the number of dimension of each points

  • data

    the list of all points

  • left

    the first data index to look for

  • right

    the last data index to look for



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# File 'lib/knnball.rb', line 44

def self.generate(data, max_dimension, actual_dimension = 1)
  return nil if data.nil?
  return Ball.new(data.first) if data.size == 1
  
  # Order the array such as the middle point is the median and 
  # that every point on the left are of lesser value than median
  # and that every point on the right are of greater value
  # than the median. They are not more sorted than that.
  median_idx = Stat.median_index(data)
  value = Stat.median!(data) {|v1, v2| v1[:point][actual_dimension-1] <=> v2[:point][actual_dimension-1]}
  ball = Ball.new(value)
  
  actual_dimension = (max_dimension == actual_dimension ? 1 : actual_dimension)
  
  ball.left = generate(data[0..(median_idx-1)], max_dimension, actual_dimension) if median_idx > 0
  ball.right = generate(data[(median_idx+1)..-1], max_dimension, actual_dimension) if median_idx < (data.count - 1)
  return ball
end

.marshall(ball_tree) ⇒ Object

Retrieve an internal string representation of the index that can then be persisted.



65
66
67
# File 'lib/knnball.rb', line 65

def self.marshall(ball_tree)
  return ""
end

.unmarshall(marshalled_content) ⇒ Object

Retrieve a BallTree instance from a previously marshalled instance.



70
71
72
# File 'lib/knnball.rb', line 70

def self.unmarshall(marshalled_content)
  return KDTree.new
end