Module: Measurable

Extended by:: Measurable

Included in:: Measurable

Defined in:: lib/measurable.rb,
lib/measurable/cosine.rb,
lib/measurable/maxmin.rb,
lib/measurable/hamming.rb,
lib/measurable/jaccard.rb,
lib/measurable/version.rb,
lib/measurable/tanimoto.rb,
lib/measurable/chebyshev.rb,
lib/measurable/euclidean.rb,
lib/measurable/haversine.rb,
lib/measurable/minkowski.rb

Constant Summary collapse

RAD_PER_DEG = PI / 180 degrees.

Math::PI / 180

VERSION = :nodoc:

"0.0.6"

EARTH_RADIUS_IN_MILES = Earth radius in miles.

EARTH_RADIUS_IN_KILOMETERS = Earth radius in kilometers. Some algorithms use 6367.

EARTH_RADIUS = The great circle distance returned will be in whatever units R is in. Provides

{
  :miles => EARTH_RADIUS_IN_MILES,
  :km => EARTH_RADIUS_IN_KILOMETERS,
  :feet => EARTH_RADIUS_IN_MILES * 5282,
  :meters => EARTH_RADIUS_IN_KILOMETERS * 1000
}

Instance Method Summary collapse

#chebyshev(u, v) ⇒ Object

call-seq: chebyshev(u, v) -> Float.
#cosine(u, v) ⇒ Object

call-seq: cosine(u, v) -> Float.
#euclidean(u, v = nil) ⇒ Object

call-seq: euclidean(u) -> Float euclidean(u, v) -> Float.
#euclidean_squared(u, v = nil) ⇒ Object

call-seq: euclidean_squared(u) -> Float euclidean_squared(u, v) -> Float.
#hamming(s1, s2) ⇒ Object

call-seq: hamming(s1, s2) -> Integer.
#haversine(u, v, unit = :meters) ⇒ Object

call-seq: haversine(u, v) -> Float.
#jaccard(u, v) ⇒ Object (also: #tanimoto_similarity)

call-seq: jaccard(u, v) -> Float.
#jaccard_index(u, v) ⇒ Object

call-seq: jaccard_index(u, v) -> Float.
#maxmin(u, v) ⇒ Object

call-seq: maxmin(u, v) -> Float.
#minkowski(u, v) ⇒ Object (also: #cityblock, #manhattan)

call-seq: minkowski(u, v) -> Numeric.
#tanimoto(u, v) ⇒ Object

call-seq: tanimoto(u, v) -> Float.

Instance Method Details

#chebyshev(u, v) ⇒ `Object`

call-seq:

chebyshev(u, v) -> Float

Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
Returns :
- The L-infinite distance between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/chebyshev.rb', line 16

def chebyshev(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
  abs_differences.max
end

#cosine(u, v) ⇒ `Object`

call-seq:

cosine(u, v) -> Float

Calculate the similarity between the orientation of two vectors.

See: en.wikipedia.org/wiki/Cosine_similarity

Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
Returns :
- The normalized dot product of u and v, that is, the angle between them in the n-dimensional space.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/cosine.rb', line 19

def cosine(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }

  dot_product / (euclidean(u) * euclidean(v))
end

#euclidean(u, v = nil) ⇒ `Object`

call-seq:

euclidean(u) -> Float
euclidean(u, v) -> Float

Calculate the ordinary distance between arrays u and v.

If v isn’t given, calculate the Euclidean norm of u.

See: en.wikipedia.org/wiki/Euclidean_distance#N_dimensions

Arguments :
- u -> An array of Numeric objects.
- v -> (Optional) An array of Numeric objects.
Returns :
- The euclidean norm of u or the euclidean distance between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/euclidean.rb', line 22

def euclidean(u, v = nil)
  # If the second argument is nil, the method should return the norm of
  # vector u. For this, we need the distance between u and the origin.
  if v.nil?
    v = Array.new(u.size, 0)
  end

  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  sum = u.zip(v).reduce(0.0) do |acc, ary|
    acc += (ary[0] - ary[-1]) ** 2
  end

  Math.sqrt(sum)
end

#euclidean_squared(u, v = nil) ⇒ `Object`

call-seq:

euclidean_squared(u) -> Float
euclidean_squared(u, v) -> Float

Calculate the same value as euclidean(u, v), but don’t take the square root of it.

This isn’t a metric in the strict sense, i.e. it doesn’t respect the triangle inequality. However, the squared Euclidean distance is very useful whenever only the relative values of distances are important, for example in optimization problems.

See: en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance

Arguments :
- u -> An array of Numeric objects.
- v -> (Optional) An array of Numeric objects.
Returns :
- The squared value of the euclidean norm of u or of the euclidean distance between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/euclidean.rb', line 62

def euclidean_squared(u, v = nil)
  # If the second argument is nil, the method should return the norm of
  # vector u. For this, we need the distance between u and the origin.
  if v.nil?
    v = Array.new(u.size, 0)
  end

  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  u.zip(v).reduce(0.0) do |acc, ary|
    acc += (ary[0] - ary[-1]) ** 2
  end
end

#hamming(s1, s2) ⇒ `Object`

call-seq:

hamming(s1, s2) -> Integer

See: en.wikipedia.org/wiki/Cosine_similarity

Arguments :
- s1 -> A String.
- s2 -> A String with the same size of s1.
Returns :
- The number of characters in which s1 and s2 differ.
Raises :
- ArgumentError -> The sizes of s1 and s2 don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/hamming.rb', line 18

def hamming(s1, s2)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if s1.size != s2.size

  s1.chars.zip(s2.chars).reduce(0) do |acc, c|
    acc += 1 if c[0] != c[1]
    acc
  end
end

#haversine(u, v, unit = :meters) ⇒ `Object`

call-seq:

haversine(u, v) -> Float

Compute accurate distances between two points given their latitudes and longitudes, even for short distances. This isn’t a distance measure in the same sense as the other methods in Measurable.

The distance returned is the great circle (or orthodromic) distance between u and v, which is the shortest distance between them on the surface of a sphere. Thus, this implementation considers the Earth to be a sphere.

Reminding that the input vectors are of the form [latitude, longitude] in degrees, so if you have the coordinates [23 32’ S, 46 37’ W] (from São Paulo), the corresponding vector is [-23.53333, -46.61667].

References:

www.movable-type.co.uk/scripts/latlong.html
en.wikipedia.org/wiki/Haversine_formula
en.wikipedia.org/wiki/Great-circle_distance
Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
- unit -> (Optional) A Symbol representing the unit of measure. Available
```
options are +:miles+, +:feet+, +:km+ and +:meters+.
```
Returns :
- The great circle distance between u and v.
Raises :
- ArgumentError -> The size of u and v must be 2.
- ArgumentError -> unit must be a Symbol.

Raises:

(ArgumentError)

# File 'lib/measurable/haversine.rb', line 49

def haversine(u, v, unit = :meters)
  # TODO: Create better exceptions.
  raise ArgumentError if u.size != 2 || v.size != 2
  raise ArgumentError if unit.class != Symbol

  dlat = u[0] - v[0]
  dlon = u[1] - v[1]

  dlon_rad = dlon * RAD_PER_DEG
  dlat_rad = dlat * RAD_PER_DEG

  lat1_rad = v[0] * RAD_PER_DEG
  lon1_rad = v[1] * RAD_PER_DEG

  lat2_rad = u[0] * RAD_PER_DEG
  lon2_rad = u[1] * RAD_PER_DEG

  a = (Math.sin(dlat_rad / 2)) ** 2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * (Math.sin(dlon_rad / 2)) ** 2
  c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a))

  EARTH_RADIUS[unit] * c
end

#jaccard(u, v) ⇒ `Object` Also known as: tanimoto_similarity

call-seq:

jaccard(u, v) -> Float

The jaccard distance is a measure of dissimilarity between two sets. It is calculated as:

jaccard_distance = 1 - jaccard_index

This is a proper metric, i.e. the following conditions hold:

- Symmetry:              jaccard(u, v) == jaccard(v, u)
- Non-negative:          jaccard(u, v) >= 0
- Coincidence axiom:     jaccard(u, v) == 0 if u == v
- Triangular inequality: jaccard(u, v) <= jaccard(u, w) + jaccard(w, v)

Arguments :
- u -> Array of 1s and 0s.
- v -> Array of 1s and 0s.
Returns :
- Float value representing the dissimilarity between u and v.
Raises :
- ArgumentError -> The size of the input arrays doesn’t match.



66
67
68

# File 'lib/measurable/jaccard.rb', line 66

def jaccard(u, v)
  1 - jaccard_index(u, v)
end

#jaccard_index(u, v) ⇒ `Object`

call-seq:

jaccard_index(u, v) -> Float

Give the similarity between two binary vectors u and v. Calculated as:

jaccard_index = |intersection| / |union|

In which intersection and union refer to u and v and |x| is the cardinality of set x.

For example:

jaccard_index([1, 0, 1], [1, 1, 1]) == 0.666...

Because |intersection| = |(1, 0, 1)| = 2 and |union| = |(1, 1, 1)| = 3.

See: en.wikipedia.org/wiki/Jaccard_coefficient

Arguments :
- u -> Array of 1s and 0s.
- v -> Array of 1s and 0s.
Returns :
- Float value representing the Jaccard similarity coefficient between u and v.
Raises :
- ArgumentError -> The size of the input arrays doesn’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/jaccard.rb', line 28

def jaccard_index(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  intersection = u.zip(v).reduce(0) do |acc, elem|
    # Both u and v must have this element.
    elem[0] + elem[1] == 2 ? (acc + 1) : acc
  end

  union = u.zip(v).reduce(0) do |acc, elem|
    # One of u and v must have this element.
    elem[0] + elem[1] >= 1 ? (acc + 1) : acc
  end

  intersection.to_f / union
end

#maxmin(u, v) ⇒ `Object`

call-seq:

maxmin(u, v) -> Float

The “Max-min distance” is used to measure similarity between two vectors.

When used in k-means clustering, this similarity measure can give better results in some datasets, as pointed out in the paper “K-means clustering using Max-min distance measure” — Visalakshi, N. K.; Suguna, J.

See: ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398

Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
Returns :
- Similarity between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/maxmin.rb', line 22

def maxmin(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  sum_min, sum_max = u.zip(v).reduce([0.0, 0.0]) do |acc, attributes|
    acc[0] += attributes.min
    acc[1] += attributes.max
    acc
  end

  sum_min / sum_max
end

#minkowski(u, v) ⇒ `Object` Also known as: cityblock, manhattan

call-seq:

minkowski(u, v) -> Numeric

Calculate the sum of the absolute value of the differences between each coordinate of u and v.

Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
Returns :
- The Minkowski (or L1) distance between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/minkowski.rb', line 17

def minkowski(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  u.zip(v).reduce(0) do |acc, elem|
    acc += (elem[0] - elem[1]).abs
  end
end

#tanimoto(u, v) ⇒ `Object`

call-seq:

tanimoto(u, v) -> Float

Tanimoto distance is a coefficient explicitly chosen such as to allow for two dissimilar specimens to be similar to a third one. This breaks the triangle inequality, thus this isn’t a metric.

More information and references on this are needed. It’s left here mostly as a piece of curiosity.

See: # en.wikipedia.org/wiki/Jaccard_index#Tanimoto.27s_Definitions_of_Similarity_and_Distance

Arguments :
- u -> An array of Numeric objects.
- v -> An array of Numeric objects.
Returns :
- A measure of the similarity between u and v.
Raises :
- ArgumentError -> The sizes of u and v don’t match.

Raises:

(ArgumentError)

# File 'lib/measurable/tanimoto.rb', line 26

def tanimoto(u, v)
  # TODO: Change this to a more specific, custom-made exception.
  raise ArgumentError if u.size != v.size

  -Math.log2(jaccard_index(u, v))
end

Module: Measurable

Constant Summary collapse

Instance Method Summary collapse

Instance Method Details

#chebyshev(u, v) ⇒ Object

#cosine(u, v) ⇒ Object

#euclidean(u, v = nil) ⇒ Object

#euclidean_squared(u, v = nil) ⇒ Object

#hamming(s1, s2) ⇒ Object

#haversine(u, v, unit = :meters) ⇒ Object

#jaccard(u, v) ⇒ Object Also known as: tanimoto_similarity

#jaccard_index(u, v) ⇒ Object

#maxmin(u, v) ⇒ Object

#minkowski(u, v) ⇒ Object Also known as: cityblock, manhattan

#tanimoto(u, v) ⇒ Object

#chebyshev(u, v) ⇒ `Object`

#cosine(u, v) ⇒ `Object`

#euclidean(u, v = nil) ⇒ `Object`

#euclidean_squared(u, v = nil) ⇒ `Object`

#hamming(s1, s2) ⇒ `Object`

#haversine(u, v, unit = :meters) ⇒ `Object`

#jaccard(u, v) ⇒ `Object` Also known as: tanimoto_similarity

#jaccard_index(u, v) ⇒ `Object`

#maxmin(u, v) ⇒ `Object`

#minkowski(u, v) ⇒ `Object` Also known as: cityblock, manhattan

#tanimoto(u, v) ⇒ `Object`