Module: Ai4r::Data::Proximity
- Defined in:
- lib/ai4r/data/proximity.rb
Overview
This module provides classical distance functions
Class Method Summary collapse
-
.cosine_distance(a, b) ⇒ Object
Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them (en.wikipedia.org/wiki/Cosine_similarity).
-
.euclidean_distance(a, b) ⇒ Object
Euclidean distance, or L2 norm.
-
.hamming_distance(a, b) ⇒ Object
The Hamming distance between two attributes vectors of equal length is the number of attributes for which the corresponding vectors are different This distance function is frequently used with binary attributes, though it can be used with other discrete attributes.
-
.manhattan_distance(a, b) ⇒ Object
city block, Manhattan distance, or L1 norm.
-
.simple_matching_distance(a, b) ⇒ Object
The “Simple matching” distance between two attribute sets is given by the number of values present on both vectors.
-
.squared_euclidean_distance(a, b) ⇒ Object
This is a faster computational replacement for eclidean distance.
-
.sup_distance(a, b) ⇒ Object
Sup distance, or L-intinity norm Parameters a and b are vectors with continuous attributes.
Class Method Details
.cosine_distance(a, b) ⇒ Object
Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them (en.wikipedia.org/wiki/Cosine_similarity).
Parameters a and b are vectors with continuous attributes.
D = sum(a * b) / sqrt(sum(a**2)) * sqrt(sum(b**2))
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/ai4r/data/proximity.rb', line 102 def self.cosine_distance(a,b) dot_product = 0.0 norm_a = 0.0 norm_b = 0.0 magnitude = 0.0 a.each_index do |i| dot_product += a[i] * b[i] norm_a += a[i] ** 2 norm_b += b[i] ** 2 end magnitude = Math.sqrt(norm_a) * Math.sqrt(norm_b) return 1 - (dot_product / magnitude) end |
.euclidean_distance(a, b) ⇒ Object
Euclidean distance, or L2 norm. Parameters a and b are vectors with continuous attributes. Euclidean distance tends to form hyperspherical clusters(Clustering, Xu and Wunsch, 2009). Translations and rotations do not cause a distortion in distance relation (Duda et al, 2001) If attributes are measured with different units, attributes with larger values and variance will dominate the metric.
36 37 38 |
# File 'lib/ai4r/data/proximity.rb', line 36 def self.euclidean_distance(a, b) Math.sqrt(squared_euclidean_distance(a, b)) end |
.hamming_distance(a, b) ⇒ Object
The Hamming distance between two attributes vectors of equal length is the number of attributes for which the corresponding vectors are different This distance function is frequently used with binary attributes, though it can be used with other discrete attributes.
69 70 71 72 73 74 75 |
# File 'lib/ai4r/data/proximity.rb', line 69 def self.hamming_distance(a,b) count = 0 a.each_index do |i| count += 1 if a[i] != b[i] end return count end |
.manhattan_distance(a, b) ⇒ Object
city block, Manhattan distance, or L1 norm. Parameters a and b are vectors with continuous attributes.
43 44 45 46 47 48 49 50 |
# File 'lib/ai4r/data/proximity.rb', line 43 def self.manhattan_distance(a, b) sum = 0.0 a.each_with_index do |item_a, i| item_b = b[i] sum += (item_a - item_b).abs end return sum end |
.simple_matching_distance(a, b) ⇒ Object
The “Simple matching” distance between two attribute sets is given by the number of values present on both vectors. If sets a and b have lengths da and db then:
S = 2/(da + db) * Number of values present on both sets
D = 1.0/S - 1
Some considerations:
-
a and b must not include repeated items
-
all attributes are treated equally
-
all attributes are treated equally
88 89 90 91 92 93 |
# File 'lib/ai4r/data/proximity.rb', line 88 def self.simple_matching_distance(a,b) similarity = 0.0 a.each {|item| similarity += 2 if b.include?(item)} similarity /= (a.length + b.length) return 1.0/similarity - 1 end |
.squared_euclidean_distance(a, b) ⇒ Object
This is a faster computational replacement for eclidean distance. Parameters a and b are vectors with continuous attributes.
18 19 20 21 22 23 24 25 |
# File 'lib/ai4r/data/proximity.rb', line 18 def self.squared_euclidean_distance(a, b) sum = 0.0 a.each_with_index do |item_a, i| item_b = b[i] sum += (item_a - item_b)**2 end return sum end |
.sup_distance(a, b) ⇒ Object
Sup distance, or L-intinity norm Parameters a and b are vectors with continuous attributes.
54 55 56 57 58 59 60 61 62 |
# File 'lib/ai4r/data/proximity.rb', line 54 def self.sup_distance(a, b) distance = 0.0 a.each_with_index do |item_a, i| item_b = b[i] diff = (item_a - item_b).abs distance = diff if diff > distance end return distance end |