Class: IndexedSearch::Match::AmericanSoundex

Inherits:
Base
  • Object
show all
Defined in:
lib/indexed_search/match/american_soundex.rb

Overview

Does the “american soundex” variation of the soundex algorithm comparison to find words that sound similar. Only works well for English.

Also supports keys longer than 4 characters, and is more tolerant of unicode characters in a way that’s somewhat similar to how MySQL’s SOUNDEX() function works…

Uses an american_soundex column to store a soundex value with each entry in the IndexedSearch::Word model. TODO: ideally non-ascii letters should be normalized to similar ascii ones if they can…

Constant Summary collapse

MAP =
{
  'a' => '0', 'e' => '0', 'i' => '0', 'o' => '0', 'u' => '0',
  'b' => '1', 'f' => '1', 'p' => '1', 'v' => '1',
  'c' => '2', 'g' => '2', 'j' => '2', 'k' => '2', 'q' => '2', 's' => '2', 'x' => '2', 'z' => '2',
  'd' => '3', 't' => '3',
  'l' => '4',
  'm' => '5', 'n' => '5',
  'r' => '6'
}

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

#find, find_attributes, #initialize, match_against_term?, #results, #term_matches, #term_non_matches

Constructor Details

This class inherits a constructor from IndexedSearch::Match::Base

Class Method Details

.make_index_value(term) ⇒ Object

see: en.wikipedia.org/wiki/Soundex#Rules our exception is of course the length, and some limited unicode tolerance



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/indexed_search/match/american_soundex.rb', line 56

def self.make_index_value(term)
  idx = 0
  idx += 1 until term[idx] =~ /\A\p{Alpha}\z/  || idx >= term.size
  return nil if idx >= term.size
  value = UnicodeUtils.simple_upcase(term[idx])
  return value if max_length == 1
  last_code = MAP[term[idx]]
  while idx < term.size do
    idx += 1
    code = MAP[term[idx]]
    if ! code.nil? && code != last_code
      value += code if code != '0'
      return value if value.size >= max_length
      last_code = code
    end
  end
  value.ljust(4, '0')
end

Instance Method Details

#scopeObject



34
35
36
# File 'lib/indexed_search/match/american_soundex.rb', line 34

def scope
  @scope.where(self.class.matcher_attribute => term_map.keys)
end

#term_mapObject



38
39
40
41
42
# File 'lib/indexed_search/match/american_soundex.rb', line 38

def term_map
  @term_map ||= Hash.new { |hash,key| hash[key] = [] }.tap do |map|
    term_matches.each { |term| map[self.class.make_index_value(term)] << term }
  end
end