Module: Sequel::Unionize

Defined in:
lib/sequel/extensions/unionize.rb

Overview

Provides efficient handling of large UNION operations.

The unionize extension allows combining many datasets through UNION operations by chunking them into manageable temporary tables or views. This is particularly useful when dealing with databases that have limitations on the number of UNION operations in a single query (e.g., Spark SQL, DuckDB).

Examples:

Load the extension

DB.extension :unionize

Basic usage

DB.unionize([dataset1, dataset2, dataset3, dataset4])

With options

DB.unionize(datasets, chunk_size: 50, all: true, temp_table_prefix: 'my_union')

Defined Under Namespace

Classes: Unionizer

Instance Method Summary collapse

Instance Method Details

#unionize(ds_set, opts = {}) ⇒ Sequel::Dataset

Efficiently combines multiple datasets using UNION operations.

This method handles large numbers of datasets by chunking them into manageable groups, creating temporary tables/views for intermediate results, and recursively combining them until a single dataset is produced.

Examples:

Basic union of datasets

db.unionize([ds1, ds2, ds3, ds4])

Union all with custom chunk size

db.unionize(datasets, all: true, chunk_size: 50)

Custom temporary table prefix

db.unionize(datasets, temp_table_prefix: 'my_union_batch')

Parameters:

  • ds_set (Array<Sequel::Dataset>)

    The datasets to combine via UNION

  • opts (Hash) (defaults to: {})

    Options for the union operation

Options Hash (opts):

  • :chunk_size (Integer) — default: 100

    Number of datasets to combine in each chunk

  • :temp_table_prefix (String) — default: 'temp_union'

    Prefix for temporary table names

  • :all (Boolean) — default: false

    Use UNION ALL instead of UNION (keeps duplicates)

  • :from_self (Boolean) — default: true

    Wrap individual datasets in subqueries

Returns:

  • (Sequel::Dataset)

    The combined dataset



161
162
163
# File 'lib/sequel/extensions/unionize.rb', line 161

def unionize(ds_set, opts = {})
  Unionizer.new(self, ds_set, opts).unionize
end