Module: Sequel::Unionize

Defined in:: lib/sequel/extensions/unionize.rb

Overview

Provides efficient handling of large UNION operations.

The unionize extension allows combining many datasets through UNION operations by chunking them into manageable temporary tables or views. This is particularly useful when dealing with databases that have limitations on the number of UNION operations in a single query (e.g., Spark SQL, DuckDB).

Examples:

Load the extension

DB.extension :unionize

Basic usage

DB.unionize([dataset1, dataset2, dataset3, dataset4])

With options

DB.unionize(datasets, chunk_size: 50, all: true, temp_table_prefix: 'my_union')

Defined Under Namespace

Classes: Unionizer

Instance Method Summary collapse

#unionize(ds_set, opts = {}) ⇒ Sequel::Dataset

Efficiently combines multiple datasets using UNION operations.

Instance Method Details

#unionize(ds_set, opts = {}) ⇒ `Sequel::Dataset`

Efficiently combines multiple datasets using UNION operations.

This method handles large numbers of datasets by chunking them into manageable groups, creating temporary tables/views for intermediate results, and recursively combining them until a single dataset is produced.

Examples:

Basic union of datasets

db.unionize([ds1, ds2, ds3, ds4])

Union all with custom chunk size

db.unionize(datasets, all: true, chunk_size: 50)

Custom temporary table prefix

db.unionize(datasets, temp_table_prefix: 'my_union_batch')

Parameters:

ds_set (Array<Sequel::Dataset>) —

The datasets to combine via UNION
opts (Hash) (defaults to: {}) —

Options for the union operation

Options Hash (opts):

:chunk_size (Integer) — default: 100 —

Number of datasets to combine in each chunk
:temp_table_prefix (String) — default: 'temp_union' —

Prefix for temporary table names
:all (Boolean) — default: false —

Use UNION ALL instead of UNION (keeps duplicates)
:from_self (Boolean) — default: true —

Wrap individual datasets in subqueries

Returns:

(Sequel::Dataset) —

The combined dataset



161
162
163

# File 'lib/sequel/extensions/unionize.rb', line 161

def unionize(ds_set, opts = {})
  Unionizer.new(self, ds_set, opts).unionize
end

Module: Sequel::Unionize

Overview

Examples:

Load the extension

Basic usage

With options

Defined Under Namespace

Instance Method Summary collapse

Instance Method Details

#unionize(ds_set, opts = {}) ⇒ Sequel::Dataset

Examples:

Basic union of datasets

Union all with custom chunk size

Custom temporary table prefix

#unionize(ds_set, opts = {}) ⇒ `Sequel::Dataset`