MasKING🤴
The command line tool for anonymizing database records by parsing a SQL dump file and build new SQL dump file with masking sensitive/credential data.
Installation
git clone [email protected]:kibitan/masking.git
bin/setup
or install it yourself as:
gem install masking
Requirement
- Ruby 2.5/2.6
Supported RDBMS
- MySQL 5.7...(TBC)
Usage
- setup configuration of target columns to
masking.yml
# table_name:
# column_name: masked_value
users:
string: anonymized string
email: anonymized+%{n}@example.com # %{n} will be replaced with sequential number
integer: 12345
float: 123.45
boolean: true
null: null
date: 2018-08-24
time: 2018-08-24 15:54:06
binary_or_blob: !binary | # Binary Data Language-Independent Type for YAML™ Version 1.1: http://yaml.org/type/binary.html
R0lGODlhDAAMAIQAAP//9/X17unp5WZmZgAAAOfn515eXvPz7Y6OjuDg4J+fn5
OTk6enp56enmlpaWNjY6Ojo4SEhP/++f/++f/++f/++f/++f/++f/++f/++f/+
+f/++f/++f/++f/++f/++SH+Dk1hZGUgd2l0aCBHSU1QACwAAAAADAAMAAAFLC
AgjoEwnuNAFOhpEMTRiggcz4BNJHrv/zCFcLiwMWYNG84BwwEeECcgggoBADs=
A value will be implicitly converted to compatible type. If you prefer to explicitly convert, you could use a tag as defined in YAML Version 1.1
not-date: !!str 2002-04-28
String should be matched with MySQL String Type. Integer/Float should be matched with MySQL Numeric Type. Date/Time should be matched with MySQL Date and Time Type.
NOTE: MasKING doesn't check actual schema's type from dump. If you put uncomaptible value it will cause error during restoring to database.
- dump with mask
MasKING works with mysqldump --complete-insert
mysqldump --complete-insert -u USERNAME DATABASE_NAME | masking > masked_dump.sql
- restore
mysql -u USERNAME MASKED_DATABASE_NAME < masked_dump.sql
options
$ masking -h
Usage: masking [options]
-c, --config=FILE_PATH specify config file. default: masking.yml
Run test & rubocop & notes
bundle exec rake
Protip
It's useful that set rake
on Git hooks.
touch .git/hooks/pre-commit && chmod +x .git/hooks/pre-commit && cat << EOF > .git/hooks/pre-commit
#!/usr/bin/env bash
bundle exec rake
EOF
Markdown lint
bundle exec mdl *.md
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Profiling
use bin/masking_profile
$ cat your_sample.sql | bin/masking_profile
flat result is saved at /your/repo/profile/flat.txt
graph result is saved at /your/repo/profile/graph.txt
graph html is saved at /your/repo/profile/graph.html
$ open profile/flat.txt
see also: ruby-prof/ruby-prof: ruby-prof: a code profiler for MRI rubies
Design Concept
KISS ~ keep it simple, stupid ~
No connection to database, No handling file, Only dealing with stdin/stdout. ~ Do One Thing and Do It Well ~
No External Dependency
Depend on only pure language standard libraries, no external libraries. (except development/test environment)
High Code Quality
100% of code coverage and low complexity
Future Todo
- Pluguable/customizable for a mask way e.g. integrate with Faker
- Compatible with other RDBMS e.g. PostgreSQL, Oracle, SQL Server
- Parse the schema type information and validate target columns value
- Integration test with real database
- Performance optimization
- Write in streaming process
- rewrite by another language?
- Well-documentation
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/kibitan/masking. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Masking project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.