Class: Linkage::Configuration
- Inherits:
-
Object
- Object
- Linkage::Configuration
- Defined in:
- lib/linkage/configuration.rb
Overview
Configuration keeps track of everything needed to run a record linkage, including which datasets you want to link, how you want to link them, and where you want to store the results. Once created, you can supply the Configuration to Runner#initialize and run it with Runner#execute.
To create a configuration, usually you will want to use Dataset#link_with, but you can create it directly if you like (see #initialize), like so:
dataset_1 = Linkage::Dataset.new('mysql://example.com/database_name', 'foo')
dataset_2 = Linkage::Dataset.new('postgres://example.com/other_name', 'bar')
result_set = Linkage::ResultSet['csv'].new('/home/foo/linkage')
config = Linkage::Configuration.new(dataset_1, dataset_2, result_set)
To add comparators to Configuration, you can call methods with the same name as registered comparators. Here's the list of builtin comparators:
Name | Class |
---|---|
compare | Linkage::Comparators::Compare |
strcompare | Linkage::Comparators::Strcompare |
within | Linkage::Comparators::Within |
For example, if you want to add a Linkage::Comparators::Compare comparator to your configuration, run this:
config.compare([:foo], [:bar], :equal_to)
This works via #method_missing. First, the comparator class is fetched via Linkage::Comparator.[]. Then fields are looked up in the FieldSet of the Dataset. Those Fields along with any other arguments you specify are passed to the constructor of the comparator you chose.
Configuration also contains information about how records are matched. Once scores are computed, the scores for each pair of records are averaged and compared against a threshold value. Record pairs that have an average score greater than or equal to the threshold value are considered matches.
The threshold value is 0.5
by default, but you can change it by setting
#threshold like so:
config.threshold = 0.75
Since scores range between 0 and 1 (inclusive), be sure to set a threshold value within the same range. The actual matching work is done by the Matcher class.
Instance Attribute Summary collapse
-
#algorithm ⇒ Object
Returns the value of attribute algorithm.
-
#comparators ⇒ Object
readonly
Returns the value of attribute comparators.
-
#dataset_1 ⇒ Object
readonly
Returns the value of attribute dataset_1.
-
#dataset_2 ⇒ Object
readonly
Returns the value of attribute dataset_2.
-
#result_set ⇒ Object
readonly
Returns the value of attribute result_set.
-
#threshold ⇒ Object
Returns the value of attribute threshold.
Instance Method Summary collapse
-
#initialize(*args) ⇒ Configuration
constructor
Create a new instance of Configuration.
- #match_recorder(matcher) ⇒ Object
- #matcher ⇒ Object
- #method_missing(name, *args, &block) ⇒ Object
- #score_recorder ⇒ Object
Constructor Details
#initialize(dataset_1, dataset_2, result_set) ⇒ Configuration #initialize(dataset, result_set) ⇒ Configuration #initialize(dataset_1, dataset_2, score_set, match_set) ⇒ Configuration #initialize(dataset, score_set, match_set) ⇒ Configuration
Create a new instance of Linkage::Configuration.
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/linkage/configuration.rb', line 93 def initialize(*args) if args.length < 2 || args.length > 4 raise ArgumentError, "wrong number of arguments (#{args.length} for 2..4)" end @dataset_1 = args[0] case args.length when 2 # dataset and result set @result_set = args[1] when 3 # dataset 1, dataset 2, and result set # dataset, score set, and match set case args[1] when Dataset, nil @dataset_2 = args[1] @result_set = args[2] when ScoreSet @result_set = ResultSet.new(args[1], args[2]) end when 4 # dataset 1, dataset 2, score set, and match set @dataset_2 = args[1] @result_set = ResultSet.new(args[2], args[3]) end @comparators = [] @algorithm = :mean @threshold = 0.5 end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/linkage/configuration.rb', line 142 def method_missing(name, *args, &block) klass = Comparator[name.to_s] if klass.nil? raise "unknown comparator: #{name}" end set_1 = args[0] if set_1.is_a?(Array) set_1 = fields_for(dataset_1, *set_1) else set_1 = fields_for(dataset_1, set_1).first end args[0] = set_1 set_2 = args[1] if set_2.is_a?(Array) set_2 = fields_for(dataset_2 || dataset_1, *set_2) else set_2 = fields_for(dataset_2 || dataset_1, set_2).first end args[1] = set_2 comparator = klass.new(*args, &block) @comparators << comparator return comparator end |
Instance Attribute Details
#algorithm ⇒ Object
Returns the value of attribute algorithm.
61 62 63 |
# File 'lib/linkage/configuration.rb', line 61 def algorithm @algorithm end |
#comparators ⇒ Object (readonly)
Returns the value of attribute comparators.
60 61 62 |
# File 'lib/linkage/configuration.rb', line 60 def comparators @comparators end |
#dataset_1 ⇒ Object (readonly)
Returns the value of attribute dataset_1.
60 61 62 |
# File 'lib/linkage/configuration.rb', line 60 def dataset_1 @dataset_1 end |
#dataset_2 ⇒ Object (readonly)
Returns the value of attribute dataset_2.
60 61 62 |
# File 'lib/linkage/configuration.rb', line 60 def dataset_2 @dataset_2 end |
#result_set ⇒ Object (readonly)
Returns the value of attribute result_set.
60 61 62 |
# File 'lib/linkage/configuration.rb', line 60 def result_set @result_set end |
#threshold ⇒ Object
Returns the value of attribute threshold.
60 61 62 |
# File 'lib/linkage/configuration.rb', line 60 def threshold @threshold end |
Instance Method Details
#match_recorder(matcher) ⇒ Object
138 139 140 |
# File 'lib/linkage/configuration.rb', line 138 def match_recorder(matcher) MatchRecorder.new(matcher, @result_set.match_set) end |
#matcher ⇒ Object
134 135 136 |
# File 'lib/linkage/configuration.rb', line 134 def matcher Matcher.new(@comparators, @result_set.score_set, @algorithm, @threshold) end |
#score_recorder ⇒ Object
124 125 126 127 128 129 130 131 132 |
# File 'lib/linkage/configuration.rb', line 124 def score_recorder pk_1 = @dataset_1.field_set.primary_key.name if @dataset_2 pk_2 = @dataset_2.field_set.primary_key.name else pk_2 = pk_1 end ScoreRecorder.new(@comparators, @result_set.score_set, [pk_1, pk_2]) end |