db_obfuscation is a gem that helps to prepare a production size obfuscated database. This obfuscated database can be used for internal testing purposes like user acceptance testing, QA/Regression testing.
db_obfuscation takes a production database and updates data in every row in each table with fake data.
db_obfuscation ensures that associations between different tables are still maintained.
The gem supports only postgres databases at the moment.
gem install db_obfuscation
db_obfuscation obfuscate -c <path of obfuscation_configuration> -s <Number of rows to be obfuscated in each db transaction> #default 100 -l <name_of_log_file>
step_size is a configuration that depends on every use case. It depends on the processing power of the computer, size of the table etc.
In our experience, 100 row updates per database transaction has been the most optimum configuration for a database. However this number may need to be changed to optimize the performance for your database.
A sample configuration folder for the gem is included with the gem. The sample folder is at
A generic configuration folder consists of following files and folders,
- Database Configuration file
This file contains credentials to connect to the database. This file needs adapter name, host, encoding, username, password, and name of the database.
adapter: postgres host: localhost encoding: unicode username: database_user database: obfuscation_test password: database_password
- Table Strategies
This folder contains a yaml file for every table, for which a users desires to override default obfuscation configuration.
Each table file contains a mapping between columns and obfuscation strategy for that column. The filename is same as the table whose configuration is specified.
A sample table strategy file is like,
table_2: field_1: :default_strategy field_2: :whitelisted date_field: :date_strategy field_3: :first_name_strategy
db_obfuscation, by default, obfuscates every string column in a table.
It uses a random word to obfuscate every string column. This default behaviour can be overridden on column and table basis by specifying different strategies respectively.
Different strategies supported are, - `:whitelisted` to skip obfuscating a particular string column in a table - `:date_strategy` to include a date column that needs to be obfuscated. Date columns in a table are not obfuscated by default. Including `:date_strategy` adds a random number of days between 31 and 240 to the current value of date. - Complete list of different strategies is [here](https://github.com/CaseCommonsDevOps/db_obfuscation/blob/master/lib/db_obfuscation/obfuscator.rb).
- Truncation Tables
This file contains string patterns for table names that need to truncated instead of being obfuscated.
Any table name that is the same as the pattern or begins with that pattern, followed by an underscore will be truncated during the obfuscation process.
truncation_patterns.yml file is like,
- truncation_table_1 - audit
Any table that begins with the word
audit_ will be selected for truncation.
- Whitelisted Tables
This file contains names of tables that don't need to be obfuscated and should not be touched.
whitelisted_tables.yml looks like this,
- whitelisted_table_1 - whitelisted_table_2
- Ruby 2.x
Copyright © 2015 Case Commons & Rajat Agrawal.
Licensed under the MIT license, available in the “LICENSE” file.