Module: Measurable
- Extended by:
- Measurable
- Included in:
- Measurable
- Defined in:
- lib/measurable.rb,
lib/measurable/cosine.rb,
lib/measurable/maxmin.rb,
lib/measurable/hamming.rb,
lib/measurable/jaccard.rb,
lib/measurable/version.rb,
lib/measurable/tanimoto.rb,
lib/measurable/chebyshev.rb,
lib/measurable/euclidean.rb,
lib/measurable/haversine.rb,
lib/measurable/minkowski.rb
Constant Summary collapse
- RAD_PER_DEG =
PI / 180 degrees.
Math::PI / 180
- VERSION =
:nodoc:
"0.0.6"
- EARTH_RADIUS_IN_MILES =
Earth radius in miles.
3956
- EARTH_RADIUS_IN_KILOMETERS =
Earth radius in kilometers. Some algorithms use 6367.
6371
- EARTH_RADIUS =
The great circle distance returned will be in whatever units R is in. Provides
{ :miles => EARTH_RADIUS_IN_MILES, :km => EARTH_RADIUS_IN_KILOMETERS, :feet => EARTH_RADIUS_IN_MILES * 5282, :meters => EARTH_RADIUS_IN_KILOMETERS * 1000 }
Instance Method Summary collapse
-
#chebyshev(u, v) ⇒ Object
call-seq: chebyshev(u, v) -> Float.
-
#cosine(u, v) ⇒ Object
call-seq: cosine(u, v) -> Float.
-
#euclidean(u, v = nil) ⇒ Object
call-seq: euclidean(u) -> Float euclidean(u, v) -> Float.
-
#euclidean_squared(u, v = nil) ⇒ Object
call-seq: euclidean_squared(u) -> Float euclidean_squared(u, v) -> Float.
-
#hamming(s1, s2) ⇒ Object
call-seq: hamming(s1, s2) -> Integer.
-
#haversine(u, v, unit = :meters) ⇒ Object
call-seq: haversine(u, v) -> Float.
-
#jaccard(u, v) ⇒ Object
(also: #tanimoto_similarity)
call-seq: jaccard(u, v) -> Float.
-
#jaccard_index(u, v) ⇒ Object
call-seq: jaccard_index(u, v) -> Float.
-
#maxmin(u, v) ⇒ Object
call-seq: maxmin(u, v) -> Float.
-
#minkowski(u, v) ⇒ Object
(also: #cityblock, #manhattan)
call-seq: minkowski(u, v) -> Numeric.
-
#tanimoto(u, v) ⇒ Object
call-seq: tanimoto(u, v) -> Float.
Instance Method Details
#chebyshev(u, v) ⇒ Object
call-seq:
chebyshev(u, v) -> Float
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects.
-
-
Returns :
-
The L-infinite distance between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
16 17 18 19 20 21 22 |
# File 'lib/measurable/chebyshev.rb', line 16 def chebyshev(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs } abs_differences.max end |
#cosine(u, v) ⇒ Object
call-seq:
cosine(u, v) -> Float
Calculate the similarity between the orientation of two vectors.
See: en.wikipedia.org/wiki/Cosine_similarity
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects.
-
-
Returns :
-
The normalized dot product of
u
andv
, that is, the angle between them in the n-dimensional space.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
19 20 21 22 23 24 25 26 |
# File 'lib/measurable/cosine.rb', line 19 def cosine(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] } dot_product / (euclidean(u) * euclidean(v)) end |
#euclidean(u, v = nil) ⇒ Object
call-seq:
euclidean(u) -> Float
euclidean(u, v) -> Float
Calculate the ordinary distance between arrays u
and v
.
If v
isn’t given, calculate the Euclidean norm of u
.
See: en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> (Optional) An array of Numeric objects.
-
-
Returns :
-
The euclidean norm of
u
or the euclidean distance betweenu
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/measurable/euclidean.rb', line 22 def euclidean(u, v = nil) # If the second argument is nil, the method should return the norm of # vector u. For this, we need the distance between u and the origin. if v.nil? v = Array.new(u.size, 0) end # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size sum = u.zip(v).reduce(0.0) do |acc, ary| acc += (ary[0] - ary[-1]) ** 2 end Math.sqrt(sum) end |
#euclidean_squared(u, v = nil) ⇒ Object
call-seq:
euclidean_squared(u) -> Float
euclidean_squared(u, v) -> Float
Calculate the same value as euclidean(u, v), but don’t take the square root of it.
This isn’t a metric in the strict sense, i.e. it doesn’t respect the triangle inequality. However, the squared Euclidean distance is very useful whenever only the relative values of distances are important, for example in optimization problems.
See: en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> (Optional) An array of Numeric objects.
-
-
Returns :
-
The squared value of the euclidean norm of
u
or of the euclidean distance betweenu
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/measurable/euclidean.rb', line 62 def euclidean_squared(u, v = nil) # If the second argument is nil, the method should return the norm of # vector u. For this, we need the distance between u and the origin. if v.nil? v = Array.new(u.size, 0) end # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size u.zip(v).reduce(0.0) do |acc, ary| acc += (ary[0] - ary[-1]) ** 2 end end |
#hamming(s1, s2) ⇒ Object
call-seq:
hamming(s1, s2) -> Integer
See: en.wikipedia.org/wiki/Cosine_similarity
-
Arguments :
-
s1
-> A String. -
s2
-> A String with the same size ofs1
.
-
-
Returns :
-
The number of characters in which
s1
ands2
differ.
-
-
Raises :
-
ArgumentError
-> The sizes ofs1
ands2
don’t match.
-
18 19 20 21 22 23 24 25 26 |
# File 'lib/measurable/hamming.rb', line 18 def hamming(s1, s2) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if s1.size != s2.size s1.chars.zip(s2.chars).reduce(0) do |acc, c| acc += 1 if c[0] != c[1] acc end end |
#haversine(u, v, unit = :meters) ⇒ Object
call-seq:
haversine(u, v) -> Float
Compute accurate distances between two points given their latitudes and longitudes, even for short distances. This isn’t a distance measure in the same sense as the other methods in Measurable
.
The distance returned is the great circle (or orthodromic) distance between u
and v
, which is the shortest distance between them on the surface of a sphere. Thus, this implementation considers the Earth to be a sphere.
Reminding that the input vectors are of the form [latitude, longitude] in degrees, so if you have the coordinates [23 32’ S, 46 37’ W] (from São Paulo), the corresponding vector is [-23.53333, -46.61667].
References:
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects. -
unit
-> (Optional) A Symbol representing the unit of measure. Availableoptions are +:miles+, +:feet+, +:km+ and +:meters+.
-
-
Returns :
-
The great circle distance between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The size ofu
andv
must be 2. -
ArgumentError
->unit
must be a Symbol.
-
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/measurable/haversine.rb', line 49 def haversine(u, v, unit = :meters) # TODO: Create better exceptions. raise ArgumentError if u.size != 2 || v.size != 2 raise ArgumentError if unit.class != Symbol dlat = u[0] - v[0] dlon = u[1] - v[1] dlon_rad = dlon * RAD_PER_DEG dlat_rad = dlat * RAD_PER_DEG lat1_rad = v[0] * RAD_PER_DEG lon1_rad = v[1] * RAD_PER_DEG lat2_rad = u[0] * RAD_PER_DEG lon2_rad = u[1] * RAD_PER_DEG a = (Math.sin(dlat_rad / 2)) ** 2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * (Math.sin(dlon_rad / 2)) ** 2 c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a)) EARTH_RADIUS[unit] * c end |
#jaccard(u, v) ⇒ Object Also known as: tanimoto_similarity
call-seq:
jaccard(u, v) -> Float
The jaccard distance is a measure of dissimilarity between two sets. It is calculated as:
jaccard_distance = 1 - jaccard_index
This is a proper metric, i.e. the following conditions hold:
- Symmetry: jaccard(u, v) == jaccard(v, u)
- Non-negative: jaccard(u, v) >= 0
- Coincidence axiom: jaccard(u, v) == 0 if u == v
- Triangular inequality: jaccard(u, v) <= jaccard(u, w) + jaccard(w, v)
-
Arguments :
-
u
-> Array of 1s and 0s. -
v
-> Array of 1s and 0s.
-
-
Returns :
-
Float value representing the dissimilarity between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The size of the input arrays doesn’t match.
-
66 67 68 |
# File 'lib/measurable/jaccard.rb', line 66 def jaccard(u, v) 1 - jaccard_index(u, v) end |
#jaccard_index(u, v) ⇒ Object
call-seq:
jaccard_index(u, v) -> Float
Give the similarity between two binary vectors u
and v
. Calculated as:
jaccard_index = |intersection| / |union|
In which intersection and union refer to u
and v
and |x| is the cardinality of set x.
For example:
jaccard_index([1, 0, 1], [1, 1, 1]) == 0.666...
Because |intersection| = |(1, 0, 1)| = 2 and |union| = |(1, 1, 1)| = 3.
See: en.wikipedia.org/wiki/Jaccard_coefficient
-
Arguments :
-
u
-> Array of 1s and 0s. -
v
-> Array of 1s and 0s.
-
-
Returns :
-
Float value representing the Jaccard similarity coefficient between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The size of the input arrays doesn’t match.
-
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/measurable/jaccard.rb', line 28 def jaccard_index(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size intersection = u.zip(v).reduce(0) do |acc, elem| # Both u and v must have this element. elem[0] + elem[1] == 2 ? (acc + 1) : acc end union = u.zip(v).reduce(0) do |acc, elem| # One of u and v must have this element. elem[0] + elem[1] >= 1 ? (acc + 1) : acc end intersection.to_f / union end |
#maxmin(u, v) ⇒ Object
call-seq:
maxmin(u, v) -> Float
The “Max-min distance” is used to measure similarity between two vectors.
When used in k-means clustering, this similarity measure can give better results in some datasets, as pointed out in the paper “K-means clustering using Max-min distance measure” — Visalakshi, N. K.; Suguna, J.
See: ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects.
-
-
Returns :
-
Similarity between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/measurable/maxmin.rb', line 22 def maxmin(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size sum_min, sum_max = u.zip(v).reduce([0.0, 0.0]) do |acc, attributes| acc[0] += attributes.min acc[1] += attributes.max acc end sum_min / sum_max end |
#minkowski(u, v) ⇒ Object Also known as: cityblock, manhattan
call-seq:
minkowski(u, v) -> Numeric
Calculate the sum of the absolute value of the differences between each coordinate of u
and v
.
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects.
-
-
Returns :
-
The Minkowski (or L1) distance between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
17 18 19 20 21 22 23 24 |
# File 'lib/measurable/minkowski.rb', line 17 def minkowski(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size u.zip(v).reduce(0) do |acc, elem| acc += (elem[0] - elem[1]).abs end end |
#tanimoto(u, v) ⇒ Object
call-seq:
tanimoto(u, v) -> Float
Tanimoto distance is a coefficient explicitly chosen such as to allow for two dissimilar specimens to be similar to a third one. This breaks the triangle inequality, thus this isn’t a metric.
More information and references on this are needed. It’s left here mostly as a piece of curiosity.
See: # en.wikipedia.org/wiki/Jaccard_index#Tanimoto.27s_Definitions_of_Similarity_and_Distance
-
Arguments :
-
u
-> An array of Numeric objects. -
v
-> An array of Numeric objects.
-
-
Returns :
-
A measure of the similarity between
u
andv
.
-
-
Raises :
-
ArgumentError
-> The sizes ofu
andv
don’t match.
-
26 27 28 29 30 31 |
# File 'lib/measurable/tanimoto.rb', line 26 def tanimoto(u, v) # TODO: Change this to a more specific, custom-made exception. raise ArgumentError if u.size != v.size -Math.log2(jaccard_index(u, v)) end |