Class: Linkage::Comparator Abstract
- Inherits:
-
Object
- Object
- Linkage::Comparator
- Includes:
- Observable
- Defined in:
- lib/linkage/comparator.rb
Overview
Comparator is the superclass for comparators in Linkage. Comparators are used to compare records and compute scores based on how closely the records relate.
Each comparator should inherit from Comparator and declare itself as
simple or advanced by overriding #type (the default is simple). Simple
comparators must define the #score method that uses data from two records
and returns a number (Integer
or Float
) between 0 and 1 (inclusive).
Advanced comparators must define both #score_dataset and #score_datasets
that use one or two Datasets respectively to create scores.
Each comparator can be registered via the Comparator.register function. This allows
Configuration a way to find a comparator by name via
Linkage::Configuration#method_missing. For example, config.compare(...)
creates a
new Linkage::Comparators::Compare object, since that comparator is registered under
the name "compare"
.
See documentation for the methods below for more information.
Direct Known Subclasses
Linkage::Comparators::Compare, Linkage::Comparators::Strcompare, Linkage::Comparators::Within
Instance Attribute Summary collapse
-
#weight ⇒ Object
readonly
Returns the value of attribute weight.
Class Method Summary collapse
-
.klass_for(name) ⇒ Class?
(also: [])
Return a registered Comparator subclass or
nil
if it doesn't exist. -
.register(name, klass) ⇒ Object
Register a new comparator.
Instance Method Summary collapse
-
#score(record_1, record_2) ⇒ Numeric
abstract
Override this to return the score of the linkage strength of two records.
-
#score_and_notify(record_1, record_2) ⇒ Object
Calls #score with two hashes of record data.
-
#score_dataset(dataset) ⇒ Object
abstract
Override this to score the linkage strength of records in one dataset.
-
#score_datasets(dataset_1, dataset_2) ⇒ Object
abstract
Override this to score the linkage strength of records in two datasets.
-
#type ⇒ Symbol
Return the type of this comparator.
- #weigh(weight) ⇒ Object
Instance Attribute Details
#weight ⇒ Object (readonly)
Returns the value of attribute weight.
25 26 27 |
# File 'lib/linkage/comparator.rb', line 25 def weight @weight end |
Class Method Details
.klass_for(name) ⇒ Class? Also known as: []
Return a registered Comparator subclass or nil
if it doesn't exist.
59 60 61 |
# File 'lib/linkage/comparator.rb', line 59 def klass_for(name) @comparators ? @comparators[name] : nil end |
.register(name, klass) ⇒ Object
Register a new comparator. Subclasses must define at least #score for
simple comparators, or #score_dataset and #score_datasets for
advanced comparators. Otherwise, an ArgumentError
will be raised when
you try to call register. The name
parameter is used in
Linkage::Configuration#method_missing as an easy way for users to select
comparators for their linkage.
45 46 47 48 49 50 51 52 53 |
# File 'lib/linkage/comparator.rb', line 45 def register(name, klass) methods = klass.instance_methods(false) if !methods.include?(:score) && (!methods.include?(:score_datasets) || !methods.include?(:score_dataset)) raise ArgumentError, "class must define either #score or both #score_datasets and #score_dataset methods" end @comparators ||= {} @comparators[name] = klass end |
Instance Method Details
#score(record_1, record_2) ⇒ Numeric
Override this to return the score of the linkage strength of two records.
This method is used to score records by Runner#score_records when
#type returns :simple
.
86 87 88 |
# File 'lib/linkage/comparator.rb', line 86 def score(record_1, record_2) raise NotImplementedError end |
#score_and_notify(record_1, record_2) ⇒ Object
Calls #score with two hashes of record data. The result is then used to notify any observers (typically ScoreRecorder).
This method is used by Runner#score_records when #type returns
:simple
. Subclasses should override #score to implement the scoring
algorithm.
166 167 168 169 170 |
# File 'lib/linkage/comparator.rb', line 166 def score_and_notify(record_1, record_2) value = score(record_1, record_2) changed notify_observers(self, record_1, record_2, value) end |
#score_dataset(dataset) ⇒ Object
Override this to score the linkage strength of records in one dataset.
This method is used to score records by Runner#score_records when
#type returns :advanced
and Linkage::Configuration is setup to link a
dataset to itself.
Since a Dataset delegates to a
Sequel::Dataset
,
you can use any
Sequel::Dataset
methods that you wish in order to select records to compare.
To record scores, subclasses must call
Observable#notify_observers
like so:
changed
notify_observers(self, record_1, record_2, score)
This works by notifying any observers, typically ScoreRecorder, that a new score has been generated. ScoreRecorder#update then calls ScoreSet#add_score with comparator ID, the primary key of each record and the score.
153 154 155 |
# File 'lib/linkage/comparator.rb', line 153 def score_dataset(dataset) raise NotImplementedError end |
#score_datasets(dataset_1, dataset_2) ⇒ Object
Override this to score the linkage strength of records in two datasets.
This method is used to score records by Runner#score_records when
#type returns :advanced
and Linkage::Configuration is setup to link two
datasets together.
Since each Dataset delegates to a
Sequel::Dataset
,
you can use any
Sequel::Dataset
methods that you wish in order to select records to compare.
To record scores, subclasses must call
Observable#notify_observers
like so:
changed
notify_observers(self, record_1, record_2, score)
This works by notifying any observers, typically ScoreRecorder, that a new score has been generated. ScoreRecorder#update then calls ScoreSet#add_score with comparator ID, the primary key of each record and the score.
120 121 122 |
# File 'lib/linkage/comparator.rb', line 120 def score_datasets(dataset_1, dataset_2) raise NotImplementedError end |
#type ⇒ Symbol
Return the type of this comparator. When #type returns :simple
,
#score_and_notify is called by Runner#score_records with each pair of
records in order to create scores. When #type returns :advanced
,
either #score_dataset or #score_datasets is called by
Runner#score_records. In advanced mode, it is left up to the
Linkage::Comparator subclass to determine which records to compare and how to
compare them.
74 75 76 |
# File 'lib/linkage/comparator.rb', line 74 def type @type || :simple end |
#weigh(weight) ⇒ Object
27 28 29 30 31 32 33 |
# File 'lib/linkage/comparator.rb', line 27 def weigh(weight) return if weight.nil? if not weight.is_a?(Numeric) raise "weight must be numeric type" end @weight = weight end |