db_obfuscation

db_obfuscation is a gem that helps to prepare a production size obfuscated database. This obfuscated database can be used for internal testing purposes like user acceptance testing, QA/Regression testing.

db_obfuscation takes a production database and updates data in every row in each table with fake data. db_obfuscation ensures that associations between different tables are still maintained.

The gem supports only postgres databases at the moment.

Installation

gem install db_obfuscation

Usage

db_obfuscation obfuscate -c <path of obfuscation_configuration>
                         -s <Number of rows to be obfuscated in each db transaction> #default 100
                         -l <name_of_log_file>

step_size is a configuration that depends on every use case. It depends on the processing power of the computer, size of the table etc.

In our experience, 100 row updates per database transaction has been the most optimum configuration for a database. However this number may need to be changed to optimize the performance for your database.

Configuration

A sample configuration folder for the gem is included with the gem. The sample folder is at spec/config.

A generic configuration folder consists of following files and folders,

  1. Database Configuration file

<path_to_config_folder>/database.yml

This file contains credentials to connect to the database. This file needs adapter name, host, encoding, username, password, and name of the database.

Sample database.yml file:

  adapter: postgres
  host: localhost
  encoding: unicode
  username: database_user
  database: obfuscation_test
  password: database_password
  1. Table Strategies

<path_to_config_folder>/table_strategies

This folder contains a yaml file for every table, for which a users desires to override default obfuscation configuration.

Each table file contains a mapping between columns and obfuscation strategy for that column. The filename is same as the table whose configuration is specified.

A sample table strategy file is like,

<spec/config/table_strategies/table_2.yml>

  table_2:
    field_1: :default_strategy
    field_2: :whitelisted
    date_field: :date_strategy
    field_3: :first_name_strategy

db_obfuscation, by default, obfuscates every string column in a table.

It uses a random word to obfuscate every string column. This default behaviour can be overridden on column and table basis by specifying different strategies respectively.

Different strategies supported are,

- `:whitelisted` to skip obfuscating a particular string column in a table
- `:date_strategy` to include a date column that needs to be obfuscated.

  Date columns in a table are not obfuscated by default. Including `:date_strategy` adds a random number of days between 31 and 240 to the current value of date.
- Complete list of different strategies is [here](https://github.com/CaseCommonsDevOps/db_obfuscation/blob/master/lib/db_obfuscation/obfuscator.rb).
  1. Truncation Tables

<path_to_config_folder>/truncation_patterns.yml

This file contains string patterns for table names that need to truncated instead of being obfuscated.

Any table name that is the same as the pattern or begins with that pattern, followed by an underscore will be truncated during the obfuscation process.

A sample truncation_patterns.yml file is like,

  - truncation_table_1
  - audit

Any table that begins with the word audit_ will be selected for truncation.

  1. Whitelisted Tables

<path_to_config_folder/whitelisted_tables.yml

This file contains names of tables that don't need to be obfuscated and should not be touched.

A sample whitelisted_tables.yml looks like this,

  - whitelisted_table_1
  - whitelisted_table_2

Requirements

  • Ruby 2.x

License

Copyright © 2015 Case Commons & Rajat Agrawal.

Licensed under the MIT license, available in the “LICENSE” file.