Class: Linkage::ScoreSet Abstract

Inherits:
Object
  • Object
show all
Defined in:
lib/linkage/score_set.rb

Overview

This class is abstract.

A ScoreSet is responsible for keeping track of scores. During the record linkage process, one or more Comparators generate scores. These scores are handled by a ScoreRecorder, which uses a ScoreSet to actually save the scores. ScoreSet is also used to fetch the linkage scores so that a Matcher can create matches.

ScoreSet is the superclass of implementations for different formats. Currently there are two formats for storing scores:

See the documentation for score set you're interested in for more information.

If you want to implement a custom ScoreSet, create a class that inherits ScoreSet and defines at least #add_score and #each_pair. You can then register that class via ScoreSet.register.

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.klass_for(name) ⇒ Class? Also known as: []

Return a registered ScoreSet subclass or nil if it doesn't exist.

Parameters:

  • name (String)

    of registered score set

Returns:

  • (Class, nil)


51
52
53
# File 'lib/linkage/score_set.rb', line 51

def klass_for(name)
  @score_sets ? @score_sets[name] : nil
end

.register(name, klass) ⇒ Object

Register a new score set. Subclasses must define at least #add_score and #each_pair. Otherwise, an ArgumentError will be raised when you try to call register.

Parameters:

  • name (String)

    Score set name used in klass_for

  • klass (Class)

    ScoreSet subclass



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/linkage/score_set.rb', line 30

def register(name, klass)
  methods = klass.instance_methods(false)
  missing = []
  unless methods.include?(:add_score)
    missing.push("#add_score")
  end
  unless methods.include?(:each_pair)
    missing.push("#each_pair")
  end
  unless missing.empty?
    raise ArgumentError, "class must define #{missing.join(" and ")}"
  end

  @score_sets ||= {}
  @score_sets[name] = klass
end

Instance Method Details

#add_score(comparator_id, id_1, id_2, value) ⇒ Object

This method is abstract.

Add a score to the ScoreSet. Subclasses must redefine this.

Parameters:

  • comparator_id (Fixnum)

    1-indexed comparator index

  • id_1 (Object)

    record id from first dataset

  • id_2 (Object)

    record id from second dataset

  • value (Fixnum, Float)

    score value

Raises:

  • (NotImplementedError)


76
77
78
# File 'lib/linkage/score_set.rb', line 76

def add_score(comparator_id, id_1, id_2, value)
  raise NotImplementedError
end

#closeObject

This is called by Linkage::ScoreRecorder#stop, after all scores have been added. Subclasses can redefine this to perform any teardown needed.



101
102
# File 'lib/linkage/score_set.rb', line 101

def close
end

#each_pair(&block) ⇒ Object

This method is abstract.

Yield scores for each pair of records. Subclasses must redefine this. This method is called by Matcher#run with a block with three parameters:

score_set.each_pair do |id_1, id_2, scores|
end

scores should be a Hash where comparator ids are keys and scores are values. For example: { 1 => 0.5, 2 => 0.75, 3 => 1 }. Note that not all comparators (including Comparators::Compare) create scores for each pair. A missing score means that pair was given a score of 0.

Raises:

  • (NotImplementedError)


95
96
97
# File 'lib/linkage/score_set.rb', line 95

def each_pair(&block)
  raise NotImplementedError
end

#open_for_readingObject

This is called by Matcher#run, before any scores are read via #each_pair. Subclasses can redefine this to perform any setup needed for reading scores.



60
61
# File 'lib/linkage/score_set.rb', line 60

def open_for_reading
end

#open_for_writingObject

This is called by Linkage::ScoreRecorder#start, before any scores are added via #add_score. Subclasses can redefine this to perform any setup needed for saving scores.



66
67
# File 'lib/linkage/score_set.rb', line 66

def open_for_writing
end