Class: Sequel::Unionize::Unionizer

Inherits:
Object
  • Object
show all
Defined in:
lib/sequel/extensions/unionize.rb

Overview

Handles the chunking and union of multiple datasets.

This class manages the process of splitting a large collection of datasets into smaller chunks, creating temporary tables/views for each chunk, and then recursively combining them until a single unified dataset is produced.

Defined Under Namespace

Classes: Chunk

Constant Summary collapse

DEFAULT_CHUNK_SIZE =

Default number of datasets to combine in each chunk

100

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(db, ds_set, opts = {}) ⇒ Unionizer

Creates a new Unionizer instance.

Parameters:

  • db (Sequel::Database)

    The database connection

  • ds_set (Array<Sequel::Dataset>)

    The datasets to combine

  • opts (Hash) (defaults to: {})

    Options for the union operation

Options Hash (opts):

  • :chunk_size (Integer) — default: 100

    Number of datasets per chunk

  • :temp_table_prefix (String) — default: 'temp_union'

    Prefix for temporary tables

  • :all (Boolean) — default: false

    Use UNION ALL instead of UNION

  • :from_self (Boolean) — default: true

    Wrap individual datasets in subqueries



109
110
111
112
113
114
115
116
117
# File 'lib/sequel/extensions/unionize.rb', line 109

def initialize(db, ds_set, opts = {})
  @db = db
  @ds_set = ds_set
  @opts = opts
  opts[:chunk_size] ||= DEFAULT_CHUNK_SIZE
  opts[:temp_table_prefix] ||= 'temp_union'
  opts[:all] ||= false
  opts[:from_self] = opts.fetch(:from_self, true)
end

Instance Attribute Details

#dbObject (readonly)

Returns the value of attribute db.



98
99
100
# File 'lib/sequel/extensions/unionize.rb', line 98

def db
  @db
end

Instance Method Details

#unionize(dses = @ds_set) ⇒ Sequel::Dataset

Performs the unionization of datasets.

This method recursively chunks the datasets, creates temporary tables/views for each chunk, and then combines them until a single dataset remains.

Parameters:

  • dses (Array<Sequel::Dataset>) (defaults to: @ds_set)

    The datasets to combine (defaults to @ds_set)

Returns:

  • (Sequel::Dataset)

    The final combined dataset



126
127
128
129
130
131
132
133
134
# File 'lib/sequel/extensions/unionize.rb', line 126

def unionize(dses = @ds_set)
  chunks = dses.each_slice(@opts[:chunk_size]).map do |chunk_of_dses|
    Chunk.new(db, chunk_of_dses, @opts)
  end

  return chunks.first.union if chunks.size == 1

  unionize(chunks.each(&:create).map { |chunk| db[chunk.name] })
end