DbSchema Build Status Gem Version

DbSchema is an opinionated database schema management tool that lets you maintain your DB schema with a single ruby file.

It works like this:

  • you create a schema.rb file where you describe the schema you want in a special DSL
  • you make your application load this file as early as possible during the application bootup in development and test environments
  • you create a rake task that loads your schema.rb and tell your favorite deployment tool to run it on each deploy
  • each time you need to change the schema you just change the schema.rb file and commit it to your VCS

As a result you always have an up-to-date database schema. No need to run and rollback migrations, no need to even think about the extra step - DbSchema compares the schema you want with the schema your database has and applies all necessary changes to the latter. This operation is idempotent - if DbSchema sees that the database already has the requested schema it does nothing.

Currently DbSchema only supports PostgreSQL.

Reasons to use

With DbSchema you almost never need to write migrations by hand and manage a collection of migration files. This gives you a list of important benefits:

  • no more YouHaveABunchOfPendingMigrations errors - all needed operations are computed from the differences between the schema definition and the actual database schema
  • no need to write separate :up and :down migrations - this is all handled automatically
  • there is no structure.sql with a database dump that constantly changes without reason

But the main reason of DbSchema existence is the pain of switching between long-running VCS branches with different migrations without resetting the database. Have you ever switched to a different branch only to see something like this?

Yeah, you must remember the oldest NO FILE migration, switch back to the previous branch, roll back every migration up to that NO FILE, discard all changes in schema.rb/structure.sql (and model annotations if you have any), then switch the branch again and migrate these down migrations. If you already wrote some code to be committed to the new branch you need to make sure it won't get discarded so a simple git reset --hard won't do. Every migration or rollback loads the whole app, resulting in 10+ seconds wasted. And at the end of it all you are trying to recall why did you ever want to switch to that branch.

DbSchema does not rely on migration files and/or schema_migrations table in the database so it seamlessly changes the schema to the one defined in the branch you switched to. There is no step 2.

Of course if you are switching from a branch that defines table A to a branch that doesn't define table A then you lose that table with all the data in it. But you would lose it even with manual migrations.

Installation

Add this line to your application's Gemfile:

gem 'db_schema', '~> 0.1.2'

And then execute:

$ bundle

Or install it yourself as:

$ gem install db_schema

Usage

You define your schema with a special DSL; you can put it in db/schema.rb file or anywhere you want. Be sure to keep this file under version control.

DbSchema DSL looks like this:

DbSchema.describe do |db|
  db.table :users do |t|
    t.primary_key :id
    t.varchar     :email,           null: false
    t.varchar     :password_digest, length: 40
    t.timestamptz :created_at
    t.timestamptz :updated_at

    t.index :email, unique: true
  end
end

Before DbSchema connects to the database you need to configure it:

DbSchema.configure(adapter: 'postgresql', database: 'my_database', user: 'bob', password: 'secret')
# or in Rails
DbSchema.configure_from_yaml(Rails.root.join('config', 'database.yml'), Rails.env)

Then you can load your schema definition (it is executable - it instantly applies itself to your database):

load 'path/to/schema.rb'

In order to get an always-up-to-date database schema in development and test environments you need to load the schema definition when your application is starting up. For instance, in Rails an initializer would be a good place to do that.

On the other hand, in production environment this can cause race condition problems as your schema can be applied concurrently by different worker processes (this also applies to staging and any other environments where the application is being run by multi-worker servers); therefore it is wiser to disable schema auto loading in such environments and run it from a rake task on each deploy.

Here's an initializer example for a Rails app:

# config/initializers/db_schema.rb
DbSchema.configure_from_yaml(Rails.root.join('config', 'database.yml'), Rails.env)

if Rails.env.development? || Rails.env.test?
  load Rails.root.join('db', 'schema.rb')
end

And the rake task:

# lib/tasks/db_schema.rake
namespace :db do
  namespace :schema do
    desc 'Apply database schema'
    task apply: :environment do
      load Rails.root.join('db', 'schema.rb')
    end
  end
end

Then you just call rake db:schema:apply from your deploy script before restarting the app.

DSL

Database schema is defined with a block passed to DbSchema.describe method. This block receives a db object on which you can call #table to define a table and #enum to define a custom enum type. Everything that belongs to a specific table is described in a block passed to #table.

DbSchema.describe do |db|
  db.extension :hstore

  db.table :users do |t|
    t.primary_key :id
    t.varchar     :email,    null: false
    t.varchar     :password, null: false
    t.varchar     :name,     null: false
    t.integer     :age
    t.user_status :status,   null: false, default: 'registered'
    t.hstore      :tracking, null: false, default: ''

    t.index :email, unique: true
  end

  db.enum :user_status, [:registered, :confirmed_email, :subscriber]

  db.table :posts do |t|
    t.primary_key :id
    t.integer     :user_id, null: false
    t.varchar     :title,   null: false, length: 50
    t.text        :content
    t.array       :tags,    of: :varchar

    t.index       :user_id
    t.foreign_key :user_id, references: :users
  end
end

Tables

Tables are described with the #table method; you pass it the name of the table and describe the table structure in the block:

db.table :users do |t|
  t.varchar :email
  t.varchar :password
end

Fields

You can define a field of any type by calling the corresponding method inside the table block passing it the field name and it's attributes. Most of the attributes are optional.

Here's an example table with various kinds of data:

db.table :people do |t|
  t.varchar     :first_name, length: 50, null: false
  t.varchar     :last_name,  length: 60, null: false
  t.integer     :age
  t.numeric     :salary, precision: 10, scale: 2
  t.text        :about
  t.date        :birthday
  t.boolean     :developer
  t.inet        :ip_address
  t.jsonb       :preferences, default: '{}'
  t.array       :interests, of: :varchar
  t.numrange    :salary_expectations

  t.timestamptz :created_at
  t.timestamptz :updated_at
end

Passing null: false to the field definition makes it NOT NULL; passing some value under the :default key makes it the default value. A symbol passed as a default is interpreted as a function call so t.timestamp :created_at, default: :now defines a field with a default value of NOW(); strings, numbers, timestamps etc are evaluated "as is".

Other attributes are type specific, like :length for varchars; the following table lists them all (values in parentheses are default attribute values).

Type Attributes
smallint
integer
bigint
numeric precision, scale
real
float
money
char length(1)
varchar length
text
bytea
timestamp
timestamptz
date
time
timetz
interval fields
boolean
point
line
lseg
box
path
polygon
circle
cidr
inet
macaddr
bit length(1)
varbit length
tsvector
tsquery
uuid
json
jsonb
array of
int4range
int8range
numrange
tsrange
tstzrange
daterange
chkpass
citext
cube
hstore
ean13
isbn13
ismn13
issn13
isbn
ismn
issn
upc
ltree
seg

The of attribute of the array type is the only required attribute (you need to specify the array element type here); other attributes either have default values or can be omitted at all.

You can also use your custom types in the same way: t.user_status :status creates a field called status with user_status type. Custom types are explained in a later section of this document.

Primary key is a special case; currently when you create a primary key with DbSchema you get a NOT NULL autoincrementing (by a sequence) integer field with a primary key constraint. There is no way to change the primary key field type or make a complex primary key at the moment; this is planned for future versions of DbSchema.

Primary keys are created with the #primary_key method:

db.table :posts do |t|
  t.primary_key :id
  t.varchar :title
end

Important: you can't rename a table or a column just by changing it's name in the schema definition - this will result in a column with the old name being deleted and a column with the new name being added; all data in that table or column will be lost.

Indexes

Indexes are created using the #index method: you pass it the field name you want to index:

db.table :users do |t|
  t.varchar :email
  t.index :email
end

Unique indexes are created with unique: true:

t.index :email, unique: true

Passing several field names makes a multiple index:

db.table :users do |t|
  t.varchar :first_name
  t.varchar :last_name

  t.index :first_name, :last_name
end

If you want to specify a custom name for your index, you can pass it in the :name option:

t.index :first_name, :last_name, name: :username_index

Otherwise the index name will be generated as "#{table_name}_#{field_names.join('_')}_index" so the index above will be called users_first_name_last_name_index.

You can specify the order of each field in your index - it's either ASC (:asc, the default), DESC (:desc), ASC NULLS FIRST (:asc_nulls_first), or DESC NULLS LAST (:desc_nulls_last). It looks like this:

db.table :some_table do |t|
  t.integer :col1
  t.integer :col2
  t.integer :col3
  t.integer :col4

  t.index col1: :asc, col2: :desc, col3: :asc_nulls_first, col4: :desc_nulls_last
end

By default B-tree indexes are created; if you need an index of a different type you can pass it in the :using option:

db.table :users do |t|
  t.array :interests, of: :varchar
  t.index :interests, using: :gin
end

You can also create a partial index if you pass some condition as SQL string in the :where option:

db.table :users do |t|
  t.varchar :email
  t.index :email, unique: true, where: 'email IS NOT NULL'
end

Be warned though that you have to specify the condition exactly as PostgreSQL outputs it in psql with \d table_name command; otherwise your index will be recreated on each DbSchema run. This will be fixed in a later DbSchema version.

If you need an index on expression you can use the same syntax replacing column names with SQL strings containing the expressions:

db.table :users do |t|
  t.timestamp :created_at
  t.index 'date(created_at)'
end

Expression indexes syntax allows specifying an order exactly like in a common index on table fields - just use a hash form like t.index 'date(created_at)' => :desc. You can also use an expression in a multiple index.

As with partial index condition (and all other SQL segments in db_schema), you must write the expression in a way psql outputs it, so instead of lower(email) you should use lower(email::text) (assuming that email is a varchar field).

Foreign keys

The #foreign_key method defines a foreign key. In it's minimal form it takes a referencing field name and referenced table name:

db.table :users do |t|
  t.primary_key :id
  t.varchar :name
end

db.table :posts do |t|
  t.integer :user_id
  t.varchar :title

  t.foreign_key :user_id, references: :users
end

The syntax above assumes that this foreign key references the primary key. If you need to reference another field you can pass a 2-element array in :references option, the first element being table name and the second being field name:

db.table :users do |t|
  t.varchar :name
  t.index :name, unique: true # you can only reference either primary keys or unique columns
end

db.table :posts do |t|
  t.varchar :username
  t.foreign_key :username, references: [:users, :name]
end

As with indexes, you can pass your custom name in the :name option; default foreign key name looks like "#{table_name}_#{fkey_fields.first}_fkey".

You can also define a composite foreign key consisting of (and referencing) multiple columns; just list them all:

db.table :table_a do |t|
  t.integer :col1
  t.integer :col2
  t.index :col1, :col2, unique: true
end

db.table :table_b do |t|
  t.integer :a_col1
  t.integer :a_col2
  t.foreign_key :a_col1, :a_col2, references: [:table_a, :col1, :col2]
end

There are 3 more options to the #foreign_key method: :on_update, :on_delete and :deferrable. First two define an action that will be taken when a referenced column is changed or the whole referenced row is deleted, respectively; you can set these to one of :no_action (the default), :restrict, :cascade, :set_null or :set_default. See PostgreSQL documentation for more information.

Passing deferrable: true defines a foreign key that is checked at the end of transaction.

Check constraints

A check constraint is like a validation on the database side: it checks if the inserted/updated row has valid values.

To define a check constraint you can use the #check method passing it the constraint name (no auto-generated names here, sorry) and the condition that must be satisfied, in a form of SQL string.

db.table :users do |t|
  t.primary_key :id
  t.varchar :name
  t.integer :age, null: false

  t.check :valid_age, 'age >= 18'
end

As with partial index conditions, for now you have to specify the SQL exactly as psql outputs it (otherwise the constraint will be recreated on each run).

Enum types

PostgreSQL allows developers to create custom enum types; value of enum type is one of a fixed set of values stored in the type definition.

Enum types are declared with the #enum method (note that you must call it from the top level of your schema and not from within some table definition):

db.enum :user_status, [:registered, :confirmed_email]

Then you can create fields of that type exactly as you would create a field of any built-in type - just call the method with the same name as the type you defined:

db.table :users do |t|
  t.user_status :status, default: 'registered'
end

After the enum type is created, you can add more values to it. They don't have to appear in the end of the values list - you can add new values to the middle or even to the beginning of the list:

db.enum :user_status, [:guest, :registered, :sent_confirmation_email, :confirmed_email, :subscriber]

Reordering and deleting values from enum types is not supported.

Extensions

PostgreSQL has a wide variety of extensions providing additional data types, functions and operators. You can use DbSchema to add and remove extensions in your database:

db.extension :hstore

Note that adding and removing extensions in Postgres requires superuser privileges.

Configuration

DbSchema must be configured prior to applying the schema. There are 2 methods you can use for that: configure and configure_from_yaml.

DbSchema.configure

configure is a generic method that receives a hash with all configuration options:

DbSchema.configure(
  adapter:  'postgresql',
  host:     ENV['db_host'],
  port:     ENV['db_port'],
  database: ENV['db_name'],
  user:     ENV['db_user'],
  password: ENV['db_password']
)

DbSchema.configure_from_yaml

configure_from_yaml is designed to use with Rails so you don't have to duplicate database connection settings from your database.yml in DbSchema configuration. Pass it the full path to your database.yml file and your current application environment (development, production etc), and it will read the db connection settings from that file.

DbSchema.configure_from_yaml(Rails.root.join('config', 'database.yml'), Rails.env)

If you need to specify other options you can simply pass them as keyword arguments after the environment:

DbSchema.configure_from_yaml(
  Rails.root.join('config', 'database.yml'),
  Rails.env,
  dry_run: true
)

Configuration options

All configuration options are described in the following table:

Option Default value Description
adapter 'postgres' Database adapter
host 'localhost' Database host
port 5432 Database port
database (no default) Database name
user nil Database user
password '' Database password
log_changes true When true, schema changes are logged
post_check true When true, database schema is checked afterwards
dry_run false When true, no operations are actually made

By default DbSchema logs the changes it applies to your database; you can disable that by setting log_changes to false.

DbSchema provides an opt-out post-run schema check; it ensures that there are no remaining differences between your schema.rb and the actual database schema. If DbSchema still sees any differences it will keep applying them on each run - usually this is harmless (because it does not really change your schema) but in the case of a partial index with a complex condition or an index on some expression it may rebuild the index which is an expensive operation on a large table. You can set post_check to false if you are 100% sure that your persistent changes are not a problem for you but I strongly recommend that you turn it on from time to time just to make sure nothing dangerous appears in these persistent changes.

The post_check option is likely to become off by default when DbSchema becomes more stable and battle-tested, and when the partial index problem will be solved.

There is also a dry run mode which does not apply the changes to your database - it just logs the necessary changes (if you leave log_changes set to true). Post check is also skipped in that case.

Dry run may be useful while you are building your schema definition for an existing app; adjust your schema.rb and apply it in dry run mode until it fits your database and next dry run doesn't report any changes. Don't forget to turn dry_run off afterwards!

Known problems and limitations

  • primary keys are hardcoded to a single NOT NULL integer field with a postgres sequence attached
  • "partial index problem": some conditions of partial indexes and check constraints can cause a false positive result of checking for differences between schema.rb and actual database schema, resulting in unwanted operations on each run (the worst of them being the recreation of an index on a large table)
  • array element type attributes are not supported
  • precision in time & datetime types isn't supported
  • no support for databases other than PostgreSQL
  • no support for renaming tables & columns

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment. To install this gem onto your local machine, run bundle exec rake install.

Contributing

Bug reports and pull requests are welcome on GitHub at 7even/db_schema.

License

The gem is available as open source under the terms of the MIT License.