Class: Namor::Comparator
- Inherits:
-
Object
- Object
- Namor::Comparator
- Defined in:
- lib/namor/comparator.rb
Overview
MULTI-MATCHING via components go through all users group by distinct sets of components pick a (small) subset of component-keys, say <10. Maybe random sample? build a set of matching rules run the subset * the full corpus * the matching rules
Instance Attribute Summary collapse
-
#corpus ⇒ Object
readonly
Returns the value of attribute corpus.
Instance Method Summary collapse
- #crunch(record) ⇒ Object
- #evaluate(record, candidate) ⇒ Object
-
#initialize(corpus) ⇒ Comparator
constructor
A new instance of Comparator.
-
#matching_all_but_one(a, b) ⇒ Object
ignore any initials.
-
#matching_initials(a, b) ⇒ Object
must have at least 1 long (non-initial-only) component in each those long parts must be identical all initials should correspond to non-matched longnames in the other input.
-
#missing_initials(a, b) ⇒ Object
must have at least 2 long (non-initial-only) components in each those long parts must be identical only one of the names can have any initials.
- #prep_missing_initials ⇒ Object
Constructor Details
#initialize(corpus) ⇒ Comparator
Returns a new instance of Comparator.
11 12 13 14 15 |
# File 'lib/namor/comparator.rb', line 11 def initialize(corpus) @corpus = corpus prep_missing_initials end |
Instance Attribute Details
#corpus ⇒ Object (readonly)
Returns the value of attribute corpus.
9 10 11 |
# File 'lib/namor/comparator.rb', line 9 def corpus @corpus end |
Instance Method Details
#crunch(record) ⇒ Object
17 18 19 20 21 22 23 |
# File 'lib/namor/comparator.rb', line 17 def crunch(record) (@corpus - [record]).each_with_object([]) do |candidate,matches| if evaluate(record, candidate) matches << candidate end end end |
#evaluate(record, candidate) ⇒ Object
25 26 27 28 29 30 |
# File 'lib/namor/comparator.rb', line 25 def evaluate(record, candidate) [:missing_initials].each do |rule| return true if send(rule, record, candidate) end false end |
#matching_all_but_one(a, b) ⇒ Object
ignore any initials. look for cases where there is exactly one name component that differs between the inputs.
78 79 80 81 82 83 |
# File 'lib/namor/comparator.rb', line 78 def matching_all_but_one(a,b) longnames_a = a.select {|s| s.length > 1} longnames_b = b.select {|s| s.length > 1} ((longnames_a | longnames_b) - (longnames_a & longnames_b)).count == 1 end |
#matching_initials(a, b) ⇒ Object
must have at least 1 long (non-initial-only) component in each those long parts must be identical all initials should correspond to non-matched longnames in the other input
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/namor/comparator.rb', line 61 def matching_initials(a,b) longnames_a = a.select {|s| s.length > 1} longnames_b = b.select {|s| s.length > 1} inits_a = a.select {|s| s.length == 1} inits_b = b.select {|s| s.length == 1} return false unless longnames_a.count >= 1 && longnames_b.count >= 1 unmatched_longnames_a = longnames_a - longnames_b unmatched_longnames_b = longnames_b - longnames_a unmatched_inits_a = unmatched_longnames_a.map {|s| s[0]} unmatched_inits_b = unmatched_longnames_b.map {|s| s[0]} inits_a == unmatched_inits_b && inits_b == unmatched_inits_a end |
#missing_initials(a, b) ⇒ Object
must have at least 2 long (non-initial-only) components in each those long parts must be identical only one of the names can have any initials
40 41 42 43 44 45 46 47 |
# File 'lib/namor/comparator.rb', line 40 def missing_initials(a,b) longnames_a = a.select {|s| s.length > 1} longnames_b = b.select {|s| s.length > 1} inits_a = a.select {|s| s.length == 1} inits_b = b.select {|s| s.length == 1} longnames_a.count >= 2 && longnames_b.count >= 2 && longnames_a == longnames_b && (inits_a.empty? || inits_b.empty?) end |
#prep_missing_initials ⇒ Object
49 50 51 52 53 54 55 56 |
# File 'lib/namor/comparator.rb', line 49 def prep_missing_initials @corpus_missing_initials = corpus.each_with_object(Set.new) do |rec,set| without_initials = rec.select {|s| s.length > 1} if without_initials.count >= 2 set << without_initials end end end |