Class: Linkage::Matcher
- Inherits:
-
Object
- Object
- Linkage::Matcher
- Includes:
- Observable
- Defined in:
- lib/linkage/matcher.rb
Overview
Matcher is responsible for combining scores from a ScoreSet and deciding
which pairs of records match. There are two parameters you can use to
determine how Matcher does this: algorithm
and threshold
.
There are currently two algorithm options: :mean
and :sum
. The mean
algorithm will create a mean score for each pair of records. The sum
algorithm will create a total score for each pair of records.
The threshold
parameter determines what is considered a match. If the
result score for a pair of records (depending on the algorithm used) is
greater than or equal to the threshold, then the pair is considered to be a
match.
Whenever Matcher finds a match, it uses the observer pattern to notify other objects that a match has been found. Usually the only observer is a MatchRecorder.
Instance Attribute Summary collapse
-
#algorithm ⇒ Object
readonly
Returns the value of attribute algorithm.
-
#comparators ⇒ Object
readonly
Returns the value of attribute comparators.
-
#score_set ⇒ Object
readonly
Returns the value of attribute score_set.
-
#threshold ⇒ Object
readonly
Returns the value of attribute threshold.
Instance Method Summary collapse
-
#initialize(comparators, score_set, algorithm, threshold) ⇒ Matcher
constructor
A new instance of Matcher.
-
#mean ⇒ Object
Combine scores for each pair of records via mean, then compare the combined score to the threshold.
-
#run ⇒ Object
Find matches.
-
#sum ⇒ Object
Combine scores for each pair of records via sum, then compare the combined score to the threshold.
Constructor Details
#initialize(comparators, score_set, algorithm, threshold) ⇒ Matcher
Returns a new instance of Matcher.
27 28 29 30 31 32 |
# File 'lib/linkage/matcher.rb', line 27 def initialize(comparators, score_set, algorithm, threshold) @comparators = comparators @score_set = score_set @algorithm = algorithm @threshold = threshold end |
Instance Attribute Details
#algorithm ⇒ Object (readonly)
Returns the value of attribute algorithm.
21 22 23 |
# File 'lib/linkage/matcher.rb', line 21 def algorithm @algorithm end |
#comparators ⇒ Object (readonly)
Returns the value of attribute comparators.
21 22 23 |
# File 'lib/linkage/matcher.rb', line 21 def comparators @comparators end |
#score_set ⇒ Object (readonly)
Returns the value of attribute score_set.
21 22 23 |
# File 'lib/linkage/matcher.rb', line 21 def score_set @score_set end |
#threshold ⇒ Object (readonly)
Returns the value of attribute threshold.
21 22 23 |
# File 'lib/linkage/matcher.rb', line 21 def threshold @threshold end |
Instance Method Details
#mean ⇒ Object
Combine scores for each pair of records via mean, then compare the combined score to the threshold. Notify observers if there's a match.
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/linkage/matcher.rb', line 41 def mean w = @comparators.collect { |comparator| comparator.weight || 1 } @score_set.open_for_reading @score_set.each_pair do |id_1, id_2, scores| sum = 0 scores.each do |key, value| sum += value * w[key-1] end mean = sum / @comparators.length.to_f if mean >= @threshold changed notify_observers(id_1, id_2, mean) end end @score_set.close end |
#run ⇒ Object
Find matches.
35 36 37 |
# File 'lib/linkage/matcher.rb', line 35 def run send(@algorithm) end |
#sum ⇒ Object
Combine scores for each pair of records via sum, then compare the combined score to the threshold. Notify observers if there's a match.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/linkage/matcher.rb', line 60 def sum w = @comparators.collect { |comparator| comparator.weight || 1 } @score_set.open_for_reading @score_set.each_pair do |id_1, id_2, scores| sum = 0 scores.each do |key, value| sum += value * w[key-1] end if sum >= @threshold changed notify_observers(id_1, id_2, sum) end end @score_set.close end |