Class: Linkage::Dataset
- Inherits:
-
Object
- Object
- Linkage::Dataset
- Defined in:
- lib/linkage/dataset.rb
Overview
Dataset is a representation of a database table. It is a thin wrapper
around a
Sequel::Dataset
.
There are three ways to create a Dataset.
Pass in a Sequel::Dataset
:
Linkage::Dataset.new(db[:foo])
Pass in a Sequel::Database
and a table name:
Linkage::Dataset.new(db, :foo)
Pass in a
Sequel-style
connection URI, a table name, and any options you want to pass to
Sequel.connect
.
Linkage::Dataset.new("mysql2://example.com/foo", :bar, :user => 'viking', :password => 'secret')
Once you've made a Dataset, you can use any
Sequel::Dataset
method on it you wish. For example, if you want to limit the dataset to
records that refer to people born after 1985 (assuming date of birth is
stored as a date type):
filtered_dataset = dataset.where('dob > :date', :date => Date.new(1985, 1, 1))
Note that
Sequel::Dataset
methods return a clone of a dataset, so you must assign the return value
to a variable.
Once you have your Dataset how you want it, you can use the #link_with method to create a Configuration for record linkage. The #link_with method takes another Dataset object and a ResultSet and returns a Configuration.
config = dataset.link_with(other_dataset, result_set)
config.compare([:foo], [:bar], :equal_to)
You can pass in a ScoreSet and MatchSet instead of a ResultSet if you wish:
config = dataset.link_with(other_dataset, score_set, match_set)
Note that a dataset can be linked with itself the same way, like so:
config = dataset.link_with(dataset, result_set)
config.compare([:foo], [:bar], :equal_to)
If you give #link_with a block, it will yield the same Configuration object to the block that it returns.
config = dataset.link_with(other_dataset, result_set) do |c|
c.compare([:foo], [:bar], :equal_to)
end
Once that's done, use a Runner to run the record linkage:
runner = Linkage::Runner.new(config)
runner.execute
Instance Attribute Summary collapse
-
#field_set ⇒ FieldSet
readonly
Returns this dataset's FieldSet.
-
#table_name ⇒ Symbol
readonly
Returns this dataset's table name.
Instance Method Summary collapse
-
#initialize(*args) ⇒ Dataset
constructor
Returns a new instance of Dataset.
-
#link_with(dataset, result_set) ⇒ Configuration
Create a Configuration for record linkage.
-
#obj ⇒ Sequel::Dataset
Returns the underlying
Sequel::Dataset
. -
#obj=(value) ⇒ Object
private
Set the underlying
Sequel::Dataset
. -
#primary_key ⇒ Field
Returns FieldSet#primary_key.
-
#schema ⇒ Array
Return the dataset's schema.
Constructor Details
#initialize(dataset) ⇒ Dataset #initialize(database, table_name) ⇒ Dataset #initialize(uri, table_name, options = {}) ⇒ Dataset
Returns a new instance of Linkage::Dataset.
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/linkage/dataset.rb', line 109 def initialize(*args) if args.length == 0 || args.length > 3 raise ArgumentError, "wrong number of arguments (#{args.length} for 1..3)" end if args.length == 1 unless args[0].kind_of?(Sequel::Dataset) raise ArgumentError, "expected Sequel::Dataset, got #{args[0].class}" end @dataset = args[0] @db = @dataset.db @table_name = @dataset.first_source_table elsif args.length == 2 && args[0].kind_of?(Sequel::Database) @db = args[0] @table_name = args[1].to_sym @dataset = @db[@table_name] else uri, table_name, = args ||= {} @db = Sequel.connect(uri, ) @table_name = table_name.to_sym @dataset = @db[@table_name] end @field_set = FieldSet.new(self) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object (protected)
Delegate methods to the underlying
Sequel::Dataset
.
183 184 185 186 187 188 189 190 191 192 |
# File 'lib/linkage/dataset.rb', line 183 def method_missing(name, *args, &block) result = @dataset.send(name, *args, &block) if result.kind_of?(Sequel::Dataset) new_object = clone new_object.send(:obj=, result) new_object else result end end |
Instance Attribute Details
#field_set ⇒ FieldSet (readonly)
Returns this dataset's FieldSet.
91 92 93 |
# File 'lib/linkage/dataset.rb', line 91 def field_set @field_set end |
#table_name ⇒ Symbol (readonly)
Returns this dataset's table name.
88 89 90 |
# File 'lib/linkage/dataset.rb', line 88 def table_name @table_name end |
Instance Method Details
#link_with(dataset, result_set) ⇒ Configuration
Create a Configuration for record linkage.
154 155 156 157 158 159 160 161 |
# File 'lib/linkage/dataset.rb', line 154 def link_with(dataset, result_set) other = dataset.eql?(self) ? nil : dataset conf = Configuration.new(self, other, result_set) if block_given? yield conf end conf end |
#obj ⇒ Sequel::Dataset
Returns the underlying Sequel::Dataset
.
139 140 141 |
# File 'lib/linkage/dataset.rb', line 139 def obj @dataset end |
#obj=(value) ⇒ Object (private)
Set the underlying Sequel::Dataset
.
144 145 146 |
# File 'lib/linkage/dataset.rb', line 144 def obj=(value) @dataset = value end |
#primary_key ⇒ Field
Returns FieldSet#primary_key.
175 176 177 |
# File 'lib/linkage/dataset.rb', line 175 def primary_key @field_set.primary_key end |
#schema ⇒ Array
Return the dataset's schema.
167 168 169 |
# File 'lib/linkage/dataset.rb', line 167 def schema @db.schema(@table_name) end |