MasKING🤴

CircleCI Acceptance Test MySQL Status Acceptance Test MariaDB Status

Coverage Status Maintainability Gem Version

The command line tool for anonymizing database records by parsing a SQL dump file and build new SQL dump file with masking sensitive/credential data.

Installation

gem install masking

Requirement

  • Ruby 2.5/2.6/2.7(preview)

Supported RDBMS

  • MySQL: 5.51, 5.6, 5.7, 8.0
  • MariaDB: 5.5, 10.02, 10.1, 10.2, 10.3, 10.4

Usage

  1. Setup configuration for anonymizing target tables/columns to masking.yml

      # table_name:
      #   column_name: masked_value
    
      users:
        string: anonymized string
        email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number
        integer: 12345
        float: 123.45
        boolean: true
        null_column: null
        date: 2018-08-24
        time: 2018-08-24 15:54:06
        binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html
          R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
          OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
          +f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
          AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
    

    A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in YAML Version 1.1

      not-date: !!str 2002-04-28
    

    String should be matched with MySQL String Type. Integer/Float should be matched with MySQL Numeric Type. Date/Time should be matched with MySQL Date and Time Type.

    NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.

  2. Dump database with anonymizing

    MasKING works with mysqldump --complete-insert

      mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > anonymized_dump.sql
    
  3. Restore from anonymized dump file

      mysql -u USERNAME ANONYMIZED_DATABASE_NAME < anonymized_dump.sql
    

    Tip: If you don't need to have anonymized dump file, you can directly insert from stream. It can be faster because it has less IO interaction.

        mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking | mysql -u USERNAME ANONYMIZED_DATABASE_NAME
    

options

$ masking -h
Usage: masking [options]
    -c, --config=FILE_PATH           specify config file. default: masking.yml
    -v, --version                    version

Use case of annonymized (production) database

  • Simulate for database migration and find a problem before release

Some schema changing statement will lock table and it will cause trouble during the migration. But, without having a large number of record such as production, a migration will finish at the moment and easy to overlook.

  • Performance optimization of database queries

Some database query can be slow, but some query isn't reproducible until you have similar amount of records/cardinality.

  • Finding bug before release on production

Some bugs are related to unexpected data in production (for instance so long text, invalid/not-well formatted data) and it might be noticed after releasing in production.

  • Better development/demo of a feature

Using similar data with real one will be good to make a good view of how feature looks like. It makes easy to find out the things to be changed/fixed before release/check the feature in production.

  • Analyze metrics on our production data with respecting GDPR

We can use this database for BI and some trouble shooting.

  • And… your idea here!

Development

git clone [email protected]:kibitan/masking.git
bin/setup

You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

boot

  bundle exec exe/masking

Run test & rubocop & notes

  bundle exec rake

acceptance test

./acceptance/run_test.sh

available option via environment variable:

  • MYSQL_HOST: database host(default: localhost)
  • MYSQL_USER: mysql user name(default: mysqluser}
  • MYSQL_PASSWORD: password for user(default: password)
  • MYSQL_DBNAME: database name(default: mydb)
with docker
docker-compose -f docker-compose.yml -f docker-compose/mysql80.yml run -e MYSQL_HOST=mysql80 app acceptance/run_test.sh

or

docker-compose/acceptance_test.sh mysql80

The docker-compose file names for other database versions, specify that file.

Markdown lint

bundle exec mdl *.md

Development with Docker

docker build . -t masking
echo "sample stdout" | docker run -i masking
docker run masking -v

Profiling

use bin/masking_profile

 $ cat your_sample.sql | bin/masking_profile
flat result is saved at /your/repo/profile/flat.txt
graph result is saved at /your/repo/profile/graph.txt
graph html is saved at /your/repo/profile/graph.html

 $ open profile/flat.txt

see also: ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies

Benchmark

use bin/benchmark.rb

$ bin/benchmark.rb
       user     system      total        real
   1.152776   0.207064   1.359840 (  1.375090)

Design Concept

KISS ~ keep it simple, stupid ~

No connection to database, No handling file, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~

No External Dependency

Depend on only pure language standard libraries, no external libraries. (except development/test environment)

Future Todo

  • Pluguable/customizable for a mask way e.g. integrate with Faker
  • Compatible with other RDBMS e.g. PostgreSQL, Oracle, SQL Server
  • Parse the schema type information and validate target columns value
  • Performance optimization
    • Write in streaming process
    • rewrite by another language?
  • Well-documentation

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/kibitan/masking. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Masking project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

1: MySQL 5.5 is already not supported by official

2: MariaDB 10.0 is already not supported by official