Class: Sequel::Unionize::Unionizer
- Inherits:
-
Object
- Object
- Sequel::Unionize::Unionizer
- Defined in:
- lib/sequel/extensions/unionize.rb
Overview
Handles the chunking and union of multiple datasets.
This class manages the process of splitting a large collection of datasets into smaller chunks, creating temporary tables/views for each chunk, and then recursively combining them until a single unified dataset is produced.
Defined Under Namespace
Classes: Chunk
Constant Summary collapse
- DEFAULT_CHUNK_SIZE =
Default number of datasets to combine in each chunk
100
Instance Attribute Summary collapse
-
#db ⇒ Object
readonly
Returns the value of attribute db.
Instance Method Summary collapse
-
#initialize(db, ds_set, opts = {}) ⇒ Unionizer
constructor
Creates a new Unionizer instance.
-
#unionize(dses = @ds_set) ⇒ Sequel::Dataset
Performs the unionization of datasets.
Constructor Details
#initialize(db, ds_set, opts = {}) ⇒ Unionizer
Creates a new Unionizer instance.
109 110 111 112 113 114 115 116 117 |
# File 'lib/sequel/extensions/unionize.rb', line 109 def initialize(db, ds_set, opts = {}) @db = db @ds_set = ds_set @opts = opts opts[:chunk_size] ||= DEFAULT_CHUNK_SIZE opts[:temp_table_prefix] ||= 'temp_union' opts[:all] ||= false opts[:from_self] = opts.fetch(:from_self, true) end |
Instance Attribute Details
#db ⇒ Object (readonly)
Returns the value of attribute db.
98 99 100 |
# File 'lib/sequel/extensions/unionize.rb', line 98 def db @db end |
Instance Method Details
#unionize(dses = @ds_set) ⇒ Sequel::Dataset
Performs the unionization of datasets.
This method recursively chunks the datasets, creates temporary tables/views for each chunk, and then combines them until a single dataset remains.
126 127 128 129 130 131 132 133 134 |
# File 'lib/sequel/extensions/unionize.rb', line 126 def unionize(dses = @ds_set) chunks = dses.each_slice(@opts[:chunk_size]).map do |chunk_of_dses| Chunk.new(db, chunk_of_dses, @opts) end return chunks.first.union if chunks.size == 1 unionize(chunks.each(&:create).map { |chunk| db[chunk.name] }) end |