Class: Treat::Workers::Extractors::Distance::Levenshtein

Inherits:
Object
  • Object
show all
Defined in:
lib/treat/workers/extractors/distance/levenshtein.rb

Overview

The C extension uses char* strings, and so Unicode strings will give incorrect distances. Need to provide a pure implementation if that’s the case (FIX).

Constant Summary collapse

DefaultOptions =
{
  ins_cost: 1,
  del_cost: 1,
  sub_cost: 1
}
@@matcher =
nil

Class Method Summary collapse

Class Method Details

.distance(entity, options) ⇒ Object

Return the levensthein distance between two strings taking into account the costs of insertion, deletion, and substitution.



19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/treat/workers/extractors/distance/levenshtein.rb', line 19

def self.distance(entity, options)

  options = DefaultOptions.merge(options)

  unless options[:to]
    raise Treat::Exception, "Must supply " +
    "a string/entity to compare to using " +
    "the option :to for this worker."
  end
  
  a, b = entity.to_s, options[:to].to_s

  Levenshtein.distance(a, b)

end