Class: Linkage::Matcher

Inherits:
Object
  • Object
show all
Includes:
Observable
Defined in:
lib/linkage/matcher.rb

Overview

Matcher is responsible for combining scores from a ScoreSet and deciding which pairs of records match. There are two parameters you can use to determine how Matcher does this: algorithm and threshold.

There are currently two algorithm options: :mean and :sum. The mean algorithm will create a mean score for each pair of records. The sum algorithm will create a total score for each pair of records.

The threshold parameter determines what is considered a match. If the result score for a pair of records (depending on the algorithm used) is greater than or equal to the threshold, then the pair is considered to be a match.

Whenever Matcher finds a match, it uses the observer pattern to notify other objects that a match has been found. Usually the only observer is a MatchRecorder.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(comparators, score_set, algorithm, threshold) ⇒ Matcher

Returns a new instance of Matcher.

Parameters:

  • comparators (Array<Comparator>)
  • score_set (ScoreSet)
  • algorithm (Symbol)

    :mean or :sum

  • threshold (Numeric)


27
28
29
30
31
32
# File 'lib/linkage/matcher.rb', line 27

def initialize(comparators, score_set, algorithm, threshold)
  @comparators = comparators
  @score_set = score_set
  @algorithm = algorithm
  @threshold = threshold
end

Instance Attribute Details

#algorithmObject (readonly)

Returns the value of attribute algorithm.



21
22
23
# File 'lib/linkage/matcher.rb', line 21

def algorithm
  @algorithm
end

#comparatorsObject (readonly)

Returns the value of attribute comparators.



21
22
23
# File 'lib/linkage/matcher.rb', line 21

def comparators
  @comparators
end

#score_setObject (readonly)

Returns the value of attribute score_set.



21
22
23
# File 'lib/linkage/matcher.rb', line 21

def score_set
  @score_set
end

#thresholdObject (readonly)

Returns the value of attribute threshold.



21
22
23
# File 'lib/linkage/matcher.rb', line 21

def threshold
  @threshold
end

Instance Method Details

#meanObject

Combine scores for each pair of records via mean, then compare the combined score to the threshold. Notify observers if there's a match.



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/linkage/matcher.rb', line 41

def mean
  w = @comparators.collect { |comparator| comparator.weight || 1 }
  @score_set.open_for_reading
  @score_set.each_pair do |id_1, id_2, scores|
    sum = 0
    scores.each do |key, value|
      sum += value * w[key-1]
    end
    mean = sum / @comparators.length.to_f
    if mean >= @threshold
      changed
      notify_observers(id_1, id_2, mean)
    end
  end
  @score_set.close
end

#runObject

Find matches.



35
36
37
# File 'lib/linkage/matcher.rb', line 35

def run
  send(@algorithm)
end

#sumObject

Combine scores for each pair of records via sum, then compare the combined score to the threshold. Notify observers if there's a match.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/linkage/matcher.rb', line 60

def sum
  w = @comparators.collect { |comparator| comparator.weight || 1 }
  @score_set.open_for_reading
  @score_set.each_pair do |id_1, id_2, scores|
    sum = 0
    scores.each do |key, value|
      sum += value * w[key-1]
    end
    if sum >= @threshold
      changed
      notify_observers(id_1, id_2, sum)
    end
  end
  @score_set.close
end