Sequel::DuckDB
A Ruby database adapter that enables Sequel to work with DuckDB databases. This gem provides full integration between Sequel's powerful ORM and query building capabilities with DuckDB's high-performance analytical database engine.
Features
- Complete Sequel Integration: Full compatibility with Sequel's Database and Dataset APIs
- Connection Management: Support for both file-based and in-memory DuckDB databases
- Schema Introspection: Automatic discovery of tables, columns, indexes, and constraints
- SQL Generation: DuckDB-optimized SQL generation for all standard operations
- Data Type Mapping: Seamless conversion between Ruby and DuckDB data types
- Transaction Support: Full transaction handling with commit/rollback capabilities
- Error Handling: Comprehensive error mapping to appropriate Sequel exceptions
- Performance Optimized: Leverages DuckDB's columnar storage and parallel processing
Installation
Add this line to your application's Gemfile:
gem 'sequel-duckdb'
And then execute:
bundle install
Or install it yourself as:
gem install sequel-duckdb
Quick Start
Basic Connection
require 'sequel'
# Connect to an in-memory database
db = Sequel.connect('duckdb::memory:')
# Connect to a file database
db = Sequel.connect('duckdb:///path/to/database.duckdb')
# Alternative connection syntax
db = Sequel.connect(
adapter: 'duckdb',
database: '/path/to/database.duckdb'
)
Basic Usage
# Create a table
db.create_table :users do
primary_key :id
String :name, null: false
String :email
Integer :age
DateTime :created_at
end
# Insert data
users = db[:users]
users.insert(name: 'John Doe', email: '[email protected]', age: 30, created_at: Time.now)
users.insert(name: 'Jane Smith', email: '[email protected]', age: 25, created_at: Time.now)
# Query data
puts users.where(age: 30).first
# => {:id=>1, :name=>"John Doe", :email=>"[email protected]", :age=>30, :created_at=>...}
puts users.where { age > 25 }.all
# => [{:id=>1, :name=>"John Doe", ...}, ...]
# Update data
users.where(name: 'John Doe').update(age: 31)
# Delete data
users.where(age: 25).delete
Development
After checking out the repo, run bin/setup to install dependencies:
git clone https://github.com/aguynamedryan/sequel-duckdb.git
cd sequel-duckdb
bin/setup
Running Tests
The test suite uses Minitest and includes both unit tests (using Sequel's mock database) and integration tests (using real DuckDB databases):
# Run all tests
bundle exec rake test
# Run specific test file
ruby test/database_test.rb
# Run tests with verbose output
ruby test/all.rb -v
Development Console
You can run bin/console for an interactive prompt that will allow you to experiment:
bin/console
This will start an IRB session with the gem loaded and a test database available.
Code Quality
The project uses RuboCop for code style enforcement:
# Check code style
bundle exec rubocop
# Auto-fix style issues
bundle exec rubocop -a
Building and Installing
To install this gem onto your local machine:
bundle exec rake install
To build the gem:
bundle exec rake build
Release Process
To release a new version:
- Update the version number in
lib/sequel/duckdb/version.rb - Update
CHANGELOG.mdwith the new version details - Run the tests to ensure everything works
- Commit the changes
- Run
bundle exec rake release
This will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/aguynamedryan/sequel-duckdb.
Development Guidelines
- Follow TDD: Write tests before implementing features
- Code Style: Follow the existing RuboCop configuration
- Documentation: Update README and code documentation for new features
- Compatibility: Ensure compatibility with supported Ruby and Sequel versions
- Performance: Consider performance implications, especially for analytical workloads
Reporting Issues
When reporting issues, please include:
- Ruby version
- Sequel version
- DuckDB version
- Operating system
- Minimal code example that reproduces the issue
- Full error message and stack trace
License
The gem is available as open source under the terms of the MIT License.
Acknowledgments
- Jeremy Evans for creating and maintaining Sequel
- The DuckDB team for the excellent database engine and Ruby client
- Contributors to sequel-hexspace and other Sequel adapters for implementation patterns ## Connection Options
Connection Strings
# In-memory database (data lost when connection closes)
db = Sequel.connect('duckdb::memory:')
# File database (persistent storage)
db = Sequel.connect('duckdb:///absolute/path/to/database.duckdb')
db = Sequel.connect('duckdb://relative/path/to/database.duckdb')
# With connection options
db = Sequel.connect('duckdb:///path/to/database.duckdb?readonly=true')
Connection Hash
db = Sequel.connect(
adapter: 'duckdb',
database: '/path/to/database.duckdb',
# DuckDB-specific options
readonly: false,
config: {
threads: 4,
memory_limit: '1GB'
}
)
Schema Operations
Table Management
# Create table with various column types
db.create_table :products do
primary_key :id
String :name, size: 255, null: false
Text :description
Decimal :price, size: [10, 2]
Integer :stock_quantity
Boolean :active, default: true
Date :release_date
DateTime :created_at
Time :daily_update_time
column :metadata, 'JSON' # DuckDB-specific type
end
# Add columns
db.alter_table :products do
add_column :category_id, Integer
add_index :category_id
end
# Drop table
db.drop_table :products
Schema Introspection
# List all tables
db.tables
# => [:users, :products, :orders]
# Get table schema
db.schema(:users)
# => [[:id, {:type=>:integer, :db_type=>"INTEGER", :primary_key=>true, ...}],
# [:name, {:type=>:string, :db_type=>"VARCHAR", :allow_null=>false, ...}], ...]
# Check if table exists
db.table_exists?(:users)
# => true
# Get indexes
db.indexes(:users)
# => {:users_name_index => {:columns=>[:name], :unique=>false, :primary=>false}}
Data Types
Supported Type Mappings
| Ruby Type | DuckDB Type | Notes |
|---|---|---|
| String | VARCHAR/TEXT | Configurable size |
| Integer | INTEGER/BIGINT | Auto-sized based on value |
| Float | REAL/DOUBLE | Precision preserved |
| BigDecimal | DECIMAL/NUMERIC | Precision and scale supported |
| TrueClass/FalseClass | BOOLEAN | Native boolean support |
| Date | DATE | Date-only values |
| Time/DateTime | TIMESTAMP | Full datetime with timezone |
| Time (time-only) | TIME | Time-only values |
| String (binary) | BLOB | Binary data storage |
Type Conversion Examples
# Automatic type conversion
users.insert(
name: 'Alice', # String -> VARCHAR
age: 28, # Integer -> INTEGER
salary: BigDecimal('75000.50'), # BigDecimal -> DECIMAL
active: true, # Boolean -> BOOLEAN
birth_date: Date.new(1995, 5, 15), # Date -> DATE
created_at: Time.now, # Time -> TIMESTAMP
profile_data: '{"key": "value"}' # String -> JSON (if column defined as JSON)
)
# Retrieved data is automatically converted back to Ruby types
user = users.first
user[:birth_date].class # => Date
user[:created_at].class # => Time
user[:active].class # => TrueClass
Query Building
Basic Queries
users = db[:users]
# SELECT with conditions
users.where(active: true)
users.where { age > 25 }
users.where(Sequel.like(:name, 'John%'))
# Ordering and limiting
users.order(:name).limit(10)
users.order(Sequel.desc(:created_at)).first
# Aggregation
users.count
users.avg(:age)
users.group(:department).select(:department, Sequel.count(:id).as(:user_count))
Advanced Queries
# JOINs
db[:users]
.join(:orders, user_id: :id)
.select(:users__name, :orders__total)
.where { orders__total > 100 }
# Subqueries
high_value_users = db[:orders]
.group(:user_id)
.having { sum(:total) > 1000 }
.select(:user_id)
db[:users].where(id: high_value_users)
# Window functions (DuckDB-specific optimization)
db[:sales]
.select(
:product_id,
:amount,
Sequel.function(:row_number).over(partition: :product_id, order: :amount).as(:rank)
)
# Common Table Expressions (CTEs)
db.with(:high_spenders,
db[:orders].group(:user_id).having { sum(:total) > 1000 }.select(:user_id)
).from(:high_spenders)
.join(:users, id: :user_id)
.select(:users__name)
Transactions
Basic Transactions
db.transaction do
users.insert(name: 'Alice', email: '[email protected]')
orders.insert(user_id: users.max(:id), total: 100.00)
# Automatically commits if no exceptions
end
# Manual rollback
db.transaction do
users.insert(name: 'Bob', email: '[email protected]')
raise Sequel::Rollback if some_condition
# Transaction will be rolled back
end
Error Handling
begin
db.transaction do
# Some database operations
users.insert(name: nil) # This will fail due to NOT NULL constraint
end
rescue Sequel::NotNullConstraintViolation => e
puts "Cannot insert user with null name: #{e.message}"
rescue Sequel::DatabaseError => e
puts "Database error: #{e.message}"
end
DuckDB-Specific Features
Analytical Queries
# DuckDB excels at analytical workloads
sales_summary = db[:sales]
.select(
:product_category,
Sequel.function(:sum, :amount).as(:total_sales),
Sequel.function(:avg, :amount).as(:avg_sale),
Sequel.function(:count, :id).as(:transaction_count)
)
.group(:product_category)
.order(Sequel.desc(:total_sales))
# Window functions for analytics
monthly_trends = db[:sales]
.select(
:month,
:amount,
Sequel.function(:lag, :amount, 1).over(order: :month).as(:prev_month),
Sequel.function(:sum, :amount).over(order: :month).as(:running_total)
)
Performance Optimizations
# Bulk inserts (more efficient than individual inserts)
users_data = [
{name: 'User 1', email: '[email protected]'},
{name: 'User 2', email: '[email protected]'},
# ... many more records
]
db[:users].multi_insert(users_data)
# Use DuckDB's columnar storage advantages
# Query only needed columns for better performance
db[:large_table].select(:id, :name, :created_at).where(active: true)
Model Integration
Sequel::Model Usage
class User < Sequel::Model
# Sequel automatically introspects the users table schema
# Associations work normally
one_to_many :orders
# Validations
def validate
super
errors.add(:email, 'must be present') if !email || email.empty?
errors.add(:email, 'must be valid') unless email =~ /@/
end
# Custom methods
def full_name
"#{first_name} #{last_name}"
end
end
class Order < Sequel::Model
many_to_one :user
def total_with_tax(tax_rate = 0.08)
total * (1 + tax_rate)
end
end
# Usage
user = User.create(name: 'John Doe', email: '[email protected]')
order = user.add_order(total: 99.99, status: 'pending')
# Associations work seamlessly
user.orders.where(status: 'completed').sum(:total)
Error Handling
The adapter maps DuckDB errors to appropriate Sequel exception types:
begin
# Various operations that might fail
db[:users].insert(name: nil) # NOT NULL violation
rescue Sequel::NotNullConstraintViolation => e
# Handle null constraint violation
rescue Sequel::UniqueConstraintViolation => e
# Handle unique constraint violation
rescue Sequel::ForeignKeyConstraintViolation => e
# Handle foreign key violation
rescue Sequel::CheckConstraintViolation => e
# Handle check constraint violation
rescue Sequel::DatabaseConnectionError => e
# Handle connection issues
rescue Sequel::DatabaseError => e
# Handle other database errors
end
Troubleshooting
Common Issues
Connection Problems
# Issue: Cannot connect to database file
# Solution: Check file path and permissions
begin
db = Sequel.connect('duckdb:///path/to/database.duckdb')
rescue Sequel::DatabaseConnectionError => e
puts "Connection failed: #{e.message}"
# Check if directory exists and is writable
# Ensure DuckDB gem is properly installed
end
Memory Issues
# Issue: Out of memory with large datasets
# Solution: Use streaming or limit result sets
db[:large_table].limit(1000).each do |row|
# Process row by row instead of loading all at once
end
# Or use DuckDB's memory configuration
db = Sequel.connect(
adapter: 'duckdb',
database: ':memory:',
config: { memory_limit: '2GB' }
)
Performance Issues
# Issue: Slow queries
# Solution: Add appropriate indexes
db.add_index :users, :email
db.add_index :orders, [:user_id, :created_at]
# Use EXPLAIN to analyze query plans
puts db[:users].where(email: '[email protected]').explain
Debugging
# Enable SQL logging to see generated queries
require 'logger'
db.loggers << Logger.new($stdout)
# This will now log all SQL queries
db[:users].where(active: true).all
# Logs: SELECT * FROM users WHERE (active = true)
Version Compatibility
- Ruby: 3.1.0 or higher
- Sequel: 5.0 or higher
- DuckDB: 0.8.0 or higher
- ruby-duckdb: 1.0.0 or higher
Getting Help
- Documentation: Sequel Documentation
- DuckDB Docs: DuckDB Documentation
- Issues: Report bugs on GitHub Issues
- Discussions: Join discussions on GitHub Discussions
Performance Tips
Query Optimization
- Select only needed columns: DuckDB's columnar storage makes this very efficient ```ruby # Good db[:users].select(:id, :name).where(active: true)
# Less efficient db[:users].where(active: true) # Selects all columns
2. **Use appropriate indexes**: Especially for frequently queried columns
```ruby
db.add_index :users, :email
db.add_index :orders, [:user_id, :status]
- Leverage DuckDB's analytical capabilities: Use window functions and aggregations
ruby # Efficient analytical query db[:sales] .select( :product_id, Sequel.function(:sum, :amount).as(:total), Sequel.function(:rank).over(order: Sequel.desc(:amount)).as(:rank) ) .group(:product_id)
Memory Management
Use streaming for large result sets:
db[:large_table].paged_each(rows_per_fetch: 1000) do |row| # Process row by row endConfigure DuckDB memory limits:
db = Sequel.connect( adapter: 'duckdb', database: '/path/to/db.duckdb', config: { memory_limit: '4GB', threads: 8 } )
Bulk Operations
Use multi_insert for bulk data loading:
# Efficient bulk insert data = 1000.times.map { |i| {name: "User #{i}", email: "user#{i}@example.com"} } db[:users].multi_insert(data)Use transactions for multiple operations:
db.transaction do # Multiple related operations user_id = db[:users].insert(name: 'John', email: '[email protected]') db[:profiles].insert(user_id: user_id, bio: 'Software developer') db[:preferences].insert(user_id: user_id, theme: 'dark') end
Documentation
Complete API Reference
For comprehensive API documentation including all methods, configuration options, and advanced features, see:
- API_DOCUMENTATION.md - Complete API reference with examples
- docs/DUCKDB_SQL_PATTERNS.md - Detailed SQL generation patterns and syntax
SQL Generation Patterns
The sequel-duckdb adapter generates SQL optimized for DuckDB while maintaining Sequel compatibility. Key patterns include:
- LIKE clauses: Clean syntax without ESCAPE clauses
- ILIKE support: Converted to UPPER() LIKE UPPER() for case-insensitive matching
- Regular expressions: Using DuckDB's regexp_matches() function
- Qualified columns: Standard dot notation (table.column)
- Recursive CTEs: Automatic WITH RECURSIVE detection
- Proper parentheses: Consistent expression grouping
Example SQL patterns:
# LIKE patterns
dataset.where(Sequel.like(:name, "%John%"))
# SQL: SELECT * FROM users WHERE (name LIKE '%John%')
# ILIKE patterns (case-insensitive)
dataset.where(Sequel.ilike(:name, "%john%"))
# SQL: SELECT * FROM users WHERE (UPPER(name) LIKE UPPER('%john%'))
# Regular expressions
dataset.where(name: /^John/)
# SQL: SELECT * FROM users WHERE (regexp_matches(name, '^John'))
For complete SQL pattern documentation, see docs/DUCKDB_SQL_PATTERNS.md.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/aguynamedryan/sequel-duckdb.
License
The gem is available as open source under the terms of the MIT License.