Backfill machine

Backfiller Build Status

The backfill machine for null database columns. This gem maybe handly for no-downtime deployment especially when you need to fill columns for table with huge amount for records without locking the table.

Typical no-downtime and non-locking cycle

  • add migration that adds new column (null: true)
  • deploy and run migration task
  • deploy code that starts filling new column in corresponding flows
  • add backfill task
  • deploy and run backfill task
  • [optional] add migration that invokes backfill task asn so keep all environments consistent (except production environment because we already backfilled data)
  • add migration that disallow null values (null: false)
  • deploy code that starts using new column

Concept

The idea is to prepare all data in selection method on database server and fetch it data using CURSOR feature and then build simple UPDATE queries. With this way we minimize db server resources usage and we lock only one record (atomic update). We use two connections to database:

  • master - to creates cursor in transaction and fetch data in batches.
  • worker - to execute small atomic update queries (no wrapper transaction)

Even if backfill process crashes you may resolve issue and run it again to process remaining amount of data.

Connection adapters

Curently it support next ActiveRecord connection adapters:

  • PostgreSQL

Installation

Add this line to your application's Gemfile:

gem 'backfiller'

And then execute:

$ bundle

Or install it yourself as:

$ gem install backfiller

Usage

Assume we want to backfill profiles.name column from users.first_name, users.last_name columns.

Create backfill task into db/backfill/profile_name.rb and defined required methods:

Single worker execution query

class Backfill::ProfileName

  def select_sql
    <<~SQL
      SELECT
        profile.id AS profile_id,
        CONCAT(users.first_name, ' ', users.last_name) AS profile_name
      FROM profiles
      INNER JOIN users ON
        users.id = profiles.user_id
      WHERE
        profiles.name IS NULL
    SQL
  end

  def execute_sql(connection, row)
    <<~SQL
      UPDATE profiles SET
        name = #{connection.quote(row['profile_name'])}
      WHERE
       id = #{connection.quote(row['profile_id'])}
    SQL
  end

end

Multiple worker execution queries

class Backfill::ProfileName

  def select_sql
    <<~SQL
      SELECT
        profile.id AS profile_id,
        CONCAT(users.first_name, ' ', users.last_name) AS profile_name
      FROM profiles
      INNER JOIN users ON
        users.id = profiles.user_id
      WHERE
        profiles.name IS NULL
    SQL
  end

  def execute_sql(connection, row)
    [
      'BEGIN',
      <<~SQL,
        UPDATE profiles SET
          name = #{connection.quote(row['profile_name'])}
        WHERE
         id = #{connection.quote(row['profile_id'])} AND
        (SELECT pg_try_advisory_xact_lock(12345678)') = TRUE
      SQL
      'COMMIT'
    ]
  end

end

Custom row processing

class Backfill::ProfileName

  def select_sql
    <<~SQL
      SELECT
        profile.id AS profile_id,
        CONCAT(users.first_name, ' ', users.last_name) AS profile_name
      FROM profiles
      INNER JOIN users ON
        users.id = profiles.user_id
      WHERE
        profiles.name IS NULL
    SQL
  end

  def process_row(connection, row)
    connection.execute 'BEGIN'
    if connection.select_value 'SELECT pg_try_advisory_xact_lock(12345678)'
      connection.execute <<~SQL
        INSERT INTO contacts(
          full_name
        )
        VALUES(
          #{connection.quote(row['profile_name'])},
        )
      SQL
    end
    connection.execute 'COMMIT'
  end

end

And then just run rake task:

$ rails db:backfill[profile_name]

Configuration

For Rails application backfiller is initialized with next options

  • task_directory: RAILS_ROOT/db/backfill
  • task_namespace: Backfill
  • batch_size: 1_000
  • cursor_threshold: nil
  • connection_pool: ApplicationRecord.connection_pool
  • logger: ApplicationRecord.logger

You may change it globally via config/initializers/backfiller.rb:

Backfiller.configure do |config|
  config.foo = bar
end

Or specify some options in certain backfill task

class Backfill::Foo
  def batch_size
    100
  end

  def cursor_threshold
    100_000
  end
end

Authors