Grainery

Database seed storage system for Rails applications. Extract database records and generate seed files organized by database with automatic dependency resolution. Like a grainery stores grain, this gem stores and organizes your database seeds.

Note: This gem was developed with assistance from Claude, Anthropic's AI assistant. Claude helped with code generation, documentation, and testing strategies throughout the development process.

⚠️ Development Status: This gem is in active development and does not yet have a comprehensive test suite. While the core functionality has been tested manually, automated tests are planned for future releases. Use with caution in production environments.

Features

✅ Automatic database detection
✅ Dependency-aware loading (topological sort)
✅ Multi-database support
✅ Database schema dumping for all related databases
✅ Configurable per project
✅ Preserves custom seeds
✅ One seed file per table
✅ Clean separation of concerns
✅ Supports SQL Server, MySQL, PostgreSQL
✅ Test database management tasks
✅ Rails 6.1 - 8.x support

Installation

Add this line to your application's Gemfile:

gem 'grainery', path: 'grainery'

And then execute:

bundle install

Usage

1. Initialize Configuration

rake grainery:init_config

This auto-detects:

All databases and model base classes
Anonymizable fields in your database schema (email, phone, SSN, Greek documents, etc.)
Creates config/grainery.yml with detected configuration

2. Harvest Data

# Harvest with limit (100 records per table) + schema dump + anonymization
rake grainery:generate

# Harvest ALL records + schema dump + anonymization (use with caution)
rake grainery:generate_all

# Harvest data only (no schema dump) + anonymization
rake grainery:generate_data_only

# Harvest without anonymization (raw production data - use with extreme caution!)
rake grainery:generate_raw

Note: By default, sensitive fields are anonymized using Faker. Configure anonymization in config/grainery.yml.

3. Load Seeds

# Load seeds only (blocked in production)
rake grainery:load

# Load schemas + seeds (blocked in production)
rake grainery:load_with_schema

# Override production protection (use with extreme caution!)
GRAINERY_ALLOW_PRODUCTION=true rake grainery:load

Note: Loading tasks are blocked in production by default to prevent accidental data loss.

This loads:

Database schemas (if using load_with_schema)
Harvested seeds (in dependency order)
Custom seeds from db/seeds.rb (last)

Directory Structure

db/

Configuration

config/grainery.yml:

# Path for harvested seed files
grainery_path: db/grainery

# Database connection mappings
database_connections:
  primary:
    connection: test
    adapter: sqlserver
    model_base_class: ApplicationRecord
  other:
    connection: other
    adapter: sqlserver
    model_base_class: OtherDB
  # ... other databases

# Lookup tables (harvest all records)
lookup_tables: []

# Field anonymization (column_name => faker_method)
# Set to empty hash {} to disable anonymization
anonymize_fields:
  email: email
  first_name: first_name
  last_name: last_name
  name: name
  phone: phone_number
  phone_number: phone_number
  address: address
  street_address: street_address
  city: city
  state: state
  zip: zip_code
  zip_code: zip_code
  postal_code: zip_code
  ssn: ssn
  credit_card: credit_card_number
  password: password
  token: token
  api_key: api_key
  secret: secret
  iban: iban
  vat_number: greek_vat
  afm: greek_vat
  amka: greek_amka
  social_security_number: greek_amka
  ssn_greek: greek_amka
  personal_number: greek_personal_number
  personal_id: greek_personal_number
  afm_extended: greek_personal_number
  ada: greek_ada
  diavgeia_id: greek_ada
  decision_number: greek_ada
  adam: greek_adam
  adam_number: greek_adam
  procurement_id: greek_adam
  date_of_birth: date_of_birth
  birth_date: date_of_birth
  dob: date_of_birth
  birthdate: date_of_birth
  identity_number: identity_number
  id_number: identity_number
  national_id: identity_number

Available Rake Tasks

Grainery Tasks

# Initialize configuration
rake grainery:init_config

# Harvest data (with limit) + schema dump + anonymization
rake grainery:generate

# Harvest ALL records + schema dump + anonymization
rake grainery:generate_all

# Harvest data only (no schema dump) + anonymization
rake grainery:generate_data_only

# Harvest without anonymization (raw production data)
rake grainery:generate_raw

# Load harvested + custom seeds
rake grainery:load

# Load schemas + seeds + custom seeds
rake grainery:load_with_schema

# Clean grainery directory
rake grainery:clean

Test Database Tasks

# Setup clean test database (schema only)
rake test:db:setup_for_grainery
# or: rake db:test:setup_for_grainery

# Seed test database with grainery data
rake test:db:seed_with_grainery

# Reset and seed (one command)
rake test:db:reset_with_grainery
# or: rake db:test:reset_with_grainery

# Clean test database (truncate all tables)
rake test:db:clean
# or: rake db:test:clean

# Show test database statistics
rake test:db:stats
# or: rake db:test:stats

Dependency Resolution

Grainer automatically:

Analyzes belongs_to associations
Builds dependency graph
Performs topological sort
Generates load_order.txt

Example Load Order

# PRIMARY Database
primary/users.rb
primary/categories.rb
primary/posts.rb
primary/comments.rb

# OTHER Database
other/departments.rb
other/projects.rb

Lookup Tables

For small reference tables (statuses, types, categories), grainer can load all records instead of samples.

Add to config/grainery.yml:

lookup_tables:
  - invoice_statuses
  - user_roles
  - categories

File Formats

Schema File Format

Each database gets a schema dump:

# Schema dump for primary database
# Generated: 2025-10-01 10:30:00
# Adapter: postgresql

ActiveRecord::Schema.define do

  create_table "users", force: :cascade do |t|
    t.string "email", null: false
    t.string "name"
    t.boolean "active", default: true
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

  add_index "users", ["email"], unique: true

end

Seed File Format

Each table gets its own seed file:

# Harvested from primary database: users
# Records: 100
# Generated: 2025-10-01 10:30:00

User.create!(
  {
    email: "[email protected]",
    name: "John Doe",
    active: true
  },
  {
    email: "[email protected]",
    name: "Jane Smith",
    active: true
  }
)

Custom Seeds

Your custom seed logic in db/seeds.rb is preserved and loaded last.

Example db/seeds.rb:

# Custom seed logic
puts "Creating admin user..."
User.find_or_create_by!(email: '[email protected]') do |user|
  user.name = 'Admin'
  user.role = 'admin'
end

puts "Setting up application defaults..."
Setting.create!(key: 'app_name', value: 'My App')

Use Cases

Development

# Harvest production-like data for development with schemas
rake grainery:generate
rake grainery:load_with_schema

Testing

# Create test fixtures with schemas
rake grainery:generate
# In test setup, load schemas and seeds
rake grainery:load_with_schema

Staging

# Harvest production data (anonymized) with schemas
rake grainery:generate_all
# Deploy to staging
# Load on staging server with full schema
rake grainery:load_with_schema

Cross-Database Migration

# Export from one database system
rake grainery:generate_all  # Captures schema + data

# Import to another database system
rake grainery:load_with_schema  # Recreates schema + loads data

Safety Features

Production Environment Protection: Destructive tasks (load, load_with_schema, test:db:*) are blocked in production
- Requires explicit GRAINERY_ALLOW_PRODUCTION=true environment variable to override
- Includes 5-second countdown when override is used
Separate Directories: Harvested seeds never touch db/seeds.rb
Dependency Order: Foreign keys respected automatically
Custom Preservation: Your db/seeds.rb always loads last
Clean Command: rake grainery:clean removes only harvested files
Optional Schema Loading: Schemas only load when explicitly requested
Per-Database Schemas: Each database gets isolated schema file

Production Safety Matrix

Safe Operations (Read-Only):

✅ rake grainery:generate - Harvests data, no modifications
✅ rake grainery:generate_all - Harvests all data, no modifications
✅ rake grainery:generate_data_only - Harvests data only, no modifications
✅ rake grainery:init_config - Creates config file only
✅ rake grainery:clean - Deletes harvested files only (not database data)

Destructive Operations (Blocked by Default):

❌ rake grainery:load - Inserts data into database
❌ rake grainery:load_with_schema - Modifies schema AND inserts data
❌ rake test:db:* - All test database operations

Recommendation:

Harvesting in production is safe and useful for creating staging/development fixtures
Loading in production should be tested thoroughly in staging first due to lack of automated test coverage
Always review generated files before loading into any environment

Data Anonymization

✅ Built-in Anonymization: Grainery automatically anonymizes sensitive fields using the Faker gem during harvest.

Automatic Detection

When you run rake grainery:init_config, Grainery automatically:

Scans all database tables and columns
Detects fields that should be anonymized based on naming patterns
Adds them to config/grainery.yml with appropriate anonymization methods

Detected patterns include: email, phone, address, ssn, password, token, Greek documents (afm, amka, ada, adam), dates of birth, and more.

How It Works

When harvesting, Grainery automatically replaces sensitive field values with fake data:

# Original production data:
{ email: "[email protected]", name: "John Doe", phone: "555-1234" }

# Anonymized in seed files:
{ email: "[email protected]", name: "Sarah Johnson", phone: "555-987-6543" }

Configuration

The config/grainery.yml file is automatically populated with detected fields during initialization. You can customize it as needed:

anonymize_fields:
  # Global field configuration (applies to all tables)
  email: email                    # Uses Faker::Internet.email
  first_name: first_name          # Uses Faker::Name.first_name
  last_name: last_name            # Uses Faker::Name.last_name
  name: name                      # Uses Faker::Name.name
  phone: phone_number             # Uses Faker::PhoneNumber.phone_number
  ssn: ssn                        # Uses Faker::IDNumber.valid

  # Table-specific configuration (when same field appears in multiple tables)
  users.address: address          # Only anonymize address in users table
  companies.address: skip         # Don't anonymize address in companies table

  # Database.table-specific configuration (most specific)
  primary.users.email: email      # Only for users table in primary database
  other.contacts.email: email     # Only for contacts table in other database

Scoping Priority:

database.table.field (highest priority - most specific)
table.field (medium priority - table-specific)
field (lowest priority - global)

When a field name appears in multiple tables, Grainery automatically uses scoped names during detection.

Disabling Anonymization

Option 1: Disable completely

# Set to empty hash
anonymize_fields: {}

Option 2: Use raw generation task

rake grainery:generate_raw  # Harvests without anonymization

Option 3: Skip specific fields

To keep real values for specific fields while anonymizing others, set them to skip:

anonymize_fields:
  email: email           # Will be anonymized
  name: name             # Will be anonymized
  company_name: skip     # Will keep real value (not anonymized)
  department: skip       # Will keep real value (not anonymized)
  phone: phone_number    # Will be anonymized

This is useful when you need to preserve certain non-sensitive reference data while still protecting personal information.

Supported Faker Methods

Personal Information:

email - Fake email addresses
first_name, last_name, name - Fake names
phone_number - Fake phone numbers
address, street_address - Fake addresses
city, state, zip_code, postal_code - Fake location data
date_of_birth - Fake date of birth preserving approximate age (±2 years, minimum age 18 to preserve adulthood)

Financial & Identity:

ssn - Fake social security numbers
credit_card_number - Fake credit card numbers
iban - Fake Greek IBAN (27 characters: GR + check digits + bank code + account number, auto-truncates to column size)
greek_vat - Fake Greek VAT number (AFM - 9 digits, adjusts to column size)
greek_amka - Fake Greek AMKA/Social Security Number (11 digits: DDMMYY + 5 digits, adjusts to column size)
greek_personal_number - Fake Greek Personal Number (12 characters: 2 digits + letter + 9-digit AFM, e.g., "12A123456789", adjusts to column size)
greek_ada - Fake Greek ADA/Diavgeia Decision Number (15 characters: 4 Greek letters + 2 digits + 4 Greek letters + dash + 1 digit + 2 Greek letters, e.g., "ΨΜΦΡ69ΟΤΝΡ-9ΤΟ", adjusts to column size)
greek_adam - Fake Greek ADAM/Public Procurement Publicity identifier (14-15 characters: 2 digits + PROC or REQ + 9 digits, e.g., "24REQ187755230" or "23PROC456789012", adjusts to column size)
identity_number - Fake identity number (alphanumeric format, adjusts to column size)

Security:

password - Fake passwords (auto-truncates to column size)
token - Random alphanumeric strings (defaults to 32 characters, adjusts to column size)
api_key - Random alphanumeric strings (defaults to 40 characters, adjusts to column size)
secret - Random alphanumeric strings (defaults to 64 characters, adjusts to column size)

Custom Field Mapping

Add your own field mappings to anonymize custom columns:

anonymize_fields:
  # Global mappings
  employee_id: ssn
  mobile: phone_number
  home_address: address
  work_email: email
  tax_id: ssn
  bank_account: iban
  tin: greek_vat
  social_insurance: greek_amka
  citizen_id: greek_personal_number
  passport_number: identity_number
  diavgeia_decision: greek_ada
  procurement_number: greek_adam
  birth_date: date_of_birth

  # Scoped examples for duplicate fields
  users.status: skip              # Don't anonymize status in users
  orders.status: skip             # Don't anonymize status in orders
  primary.employees.department: skip  # Department in primary.employees
  other.staff.department: skip        # Department in other.staff

  # Skip anonymization for non-sensitive fields
  company_name: skip
  department: skip
  job_title: skip

Important Notes

Anonymization happens during harvest, not during load
Generated seed files contain anonymized data
Original production data is never modified
Safe to commit anonymized seed files to version control
Lookup tables are not anonymized (reference data)
Anonymization can be disabled per-harvest using generate_raw task
Respects database constraints: Fake values are automatically truncated to match column size limits
Type-aware: String fields respect their maximum length, numeric fields maintain their data type
Selective anonymization: Use skip to preserve real values for specific fields while anonymizing others
Scoped configuration: When the same field name appears in multiple tables, use table.field or database.table.field notation for table-specific or database-specific anonymization

Best Practices

Use Limits: Start with rake grainery:generate (100 records)
Review Load Order: Check db/grainery/load_order.txt
Test Loading: Run rake grainery:load on clean database first
Commit Selectively: Consider .gitignore for large grainery files
Custom Seeds Last: Keep application-specific logic in db/seeds.rb

Troubleshooting

Circular Dependencies

If you see "Circular dependency detected", check for:

Self-referential associations
Circular foreign keys

Solution: Temporarily remove optional: true or foreign_key: false

Missing Records

If records fail to load:

Check load_order.txt for correct ordering
Verify foreign key constraints
Review error messages in console output

Large Files

If seed files are too large:

# Use limit parameter
rake grainery:generate  # 100 records per table (default)

Example Workflow

# 1. Initialize on first use
rake grainery:init_config

# 2. Harvest from production (with VPN/SSH tunnel)
RAILS_ENV=production rake grainery:generate

# 3. Review generated files
ls -la db/grainery/

# 4. Commit grainery files (optional)
git add db/grainery/
git commit -m "Add production seed data"

# 5. On another machine, pull and load
git pull
rake db:reset
rake grainery:load

# 6. Your custom seeds run automatically last
# db/seeds.rb is executed after all harvested seeds

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/mpantel/grainery.

License

The gem is available as open source under the terms of the MIT License.