Class: Linkage::Comparator Abstract

Inherits:
Object
  • Object
show all
Includes:
Observable
Defined in:
lib/linkage/comparator.rb

Overview

This class is abstract.

Comparator is the superclass for comparators in Linkage. Comparators are used to compare records and compute scores based on how closely the records relate.

Each comparator should inherit from Comparator and declare itself as simple or advanced by overriding #type (the default is simple). Simple comparators must define the #score method that uses data from two records and returns a number (Integer or Float) between 0 and 1 (inclusive). Advanced comparators must define both #score_dataset and #score_datasets that use one or two Datasets respectively to create scores.

Each comparator can be registered via the Comparator.register function. This allows Configuration a way to find a comparator by name via Linkage::Configuration#method_missing. For example, config.compare(...) creates a new Linkage::Comparators::Compare object, since that comparator is registered under the name "compare".

See documentation for the methods below for more information.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#weightObject (readonly)

Returns the value of attribute weight.



25
26
27
# File 'lib/linkage/comparator.rb', line 25

def weight
  @weight
end

Class Method Details

.klass_for(name) ⇒ Class? Also known as: []

Return a registered Comparator subclass or nil if it doesn't exist.

Parameters:

  • name (String)

    of registered comparator

Returns:

  • (Class, nil)


59
60
61
# File 'lib/linkage/comparator.rb', line 59

def klass_for(name)
  @comparators ? @comparators[name] : nil
end

.register(name, klass) ⇒ Object

Register a new comparator. Subclasses must define at least #score for simple comparators, or #score_dataset and #score_datasets for advanced comparators. Otherwise, an ArgumentError will be raised when you try to call register. The name parameter is used in Linkage::Configuration#method_missing as an easy way for users to select comparators for their linkage.

Parameters:

  • name (String)

    Comparator name used in klass_for

  • klass (Class)

    Comparator subclass



45
46
47
48
49
50
51
52
53
# File 'lib/linkage/comparator.rb', line 45

def register(name, klass)
  methods = klass.instance_methods(false)
  if !methods.include?(:score) && (!methods.include?(:score_datasets) || !methods.include?(:score_dataset))
    raise ArgumentError, "class must define either #score or both #score_datasets and #score_dataset methods"
  end

  @comparators ||= {}
  @comparators[name] = klass
end

Instance Method Details

#score(record_1, record_2) ⇒ Numeric

This method is abstract.

Override this to return the score of the linkage strength of two records. This method is used to score records by Runner#score_records when #type returns :simple.

Parameters:

  • record_1 (Hash)

    data from first record

  • record_2 (Hash)

    data from second record

Returns:

  • (Numeric)

    value between 0 and 1 (inclusive)

Raises:

  • (NotImplementedError)


86
87
88
# File 'lib/linkage/comparator.rb', line 86

def score(record_1, record_2)
  raise NotImplementedError
end

#score_and_notify(record_1, record_2) ⇒ Object

Calls #score with two hashes of record data. The result is then used to notify any observers (typically ScoreRecorder).

This method is used by Runner#score_records when #type returns :simple. Subclasses should override #score to implement the scoring algorithm.

Parameters:

  • record_1 (Hash)

    data from first record

  • record_2 (Hash)

    data from second record



166
167
168
169
170
# File 'lib/linkage/comparator.rb', line 166

def score_and_notify(record_1, record_2)
  value = score(record_1, record_2)
  changed
  notify_observers(self, record_1, record_2, value)
end

#score_dataset(dataset) ⇒ Object

This method is abstract.

Override this to score the linkage strength of records in one dataset. This method is used to score records by Runner#score_records when #type returns :advanced and Linkage::Configuration is setup to link a dataset to itself.

Since a Dataset delegates to a Sequel::Dataset, you can use any Sequel::Dataset methods that you wish in order to select records to compare.

To record scores, subclasses must call Observable#notify_observers like so:

changed
notify_observers(self, record_1, record_2, score)

This works by notifying any observers, typically ScoreRecorder, that a new score has been generated. ScoreRecorder#update then calls ScoreSet#add_score with comparator ID, the primary key of each record and the score.

Parameters:

Raises:

  • (NotImplementedError)

See Also:



153
154
155
# File 'lib/linkage/comparator.rb', line 153

def score_dataset(dataset)
  raise NotImplementedError
end

#score_datasets(dataset_1, dataset_2) ⇒ Object

This method is abstract.

Override this to score the linkage strength of records in two datasets. This method is used to score records by Runner#score_records when #type returns :advanced and Linkage::Configuration is setup to link two datasets together.

Since each Dataset delegates to a Sequel::Dataset, you can use any Sequel::Dataset methods that you wish in order to select records to compare.

To record scores, subclasses must call Observable#notify_observers like so:

changed
notify_observers(self, record_1, record_2, score)

This works by notifying any observers, typically ScoreRecorder, that a new score has been generated. ScoreRecorder#update then calls ScoreSet#add_score with comparator ID, the primary key of each record and the score.

Parameters:

Raises:

  • (NotImplementedError)

See Also:



120
121
122
# File 'lib/linkage/comparator.rb', line 120

def score_datasets(dataset_1, dataset_2)
  raise NotImplementedError
end

#typeSymbol

Return the type of this comparator. When #type returns :simple, #score_and_notify is called by Runner#score_records with each pair of records in order to create scores. When #type returns :advanced, either #score_dataset or #score_datasets is called by Runner#score_records. In advanced mode, it is left up to the Linkage::Comparator subclass to determine which records to compare and how to compare them.

Returns:

  • (Symbol)

    either :simple or :advanced



74
75
76
# File 'lib/linkage/comparator.rb', line 74

def type
  @type || :simple
end

#weigh(weight) ⇒ Object



27
28
29
30
31
32
33
# File 'lib/linkage/comparator.rb', line 27

def weigh(weight)
  return if weight.nil?
  if not weight.is_a?(Numeric)
    raise "weight must be numeric type"
  end
  @weight = weight
end