Module: VER::Levenshtein
- Defined in:
- lib/ver/vendor/levenshtein.rb
Overview
Levenshtein distance algorithm implementation for Ruby, with UTF-8 support.
The Levenshtein distance is a measure of how similar two strings s and t are, calculated as the number of deletions/insertions/substitutions needed to transform s into t. The greater the distance, the more the strings differ.
The Levenshtein distance is also sometimes referred to as the easier-to-pronounce-and-spell ‘edit distance’.
The original author is Paul Battley ([email protected]) Modified by Michael Fellinger ([email protected])
The original implementation is part of the ‘english’ gem, distributed under the Ruby license.
Class Method Summary collapse
-
.distance(str1, str2) ⇒ Object
Calculate the Levenshtein distance between two strings
str1
andstr2
.
Class Method Details
.distance(str1, str2) ⇒ Object
Calculate the Levenshtein distance between two strings str1
and str2
. str1
and str2
will be encoded as UTF8.
Be aware that this algorithm doesn’t perform normalization. If there is a possibility of different normalized forms being used, normalization should be performed beforehand.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/ver/vendor/levenshtein.rb', line 26 def distance(str1, str2) s = str1.encode(Encoding::UTF_8).unpack('U*') t = str2.encode(Encoding::UTF_8).unpack('U*') n = s.length m = t.length return m if (0 == n) return n if (0 == m) d = (0..m).to_a x = nil (0...n).each do |i| e = i+1 (0...m).each do |j| cost = (s[i] == t[j]) ? 0 : 1 x = [ d[j+1] + 1, # insertion e + 1, # deletion d[j] + cost # substitution ].min d[j] = e e = x end d[m] = x end return x end |