Encrypt filter plugin for Embulk

Converts columns using an encryption algorithm such as AES.

Encrypted data is encoded using base64. For example, if you have following input records:

id,password,comment
1,super,a
2,secret,b

You can apply encryption to password column and get following outputs:

id,password,comment
1,ayxU9lMA1iASdHGy/eAlWw==,a
2,v8ffsUOfspaqZ1KI7tPz+A==,b

Overview

  • Plugin type: filter

Configuration

  • algorithm: encryption algorithm (see below) (enum, required)
  • column_names: names of string columns to encrypt (array of string, required)
  • key_type: encryption key (enum, optional, default: inline), can be either "inline" or "s3"
  • key_hex: encryption key (string, required if key_type is inline)
  • iv_hex: encryption initialization vector (string, required if mode of the algorithm is CBC and key_type is inline)
  • output_encoding: the encoding of encrypted value, can be either "base64" or "hex" (base16)
  • aws_params: AWS/S3 parameters (hash, required if key_type is s3)
    • region: a valid AWS region
    • access_key: a valid AWS access key
    • secret_key: a valid AWS secret key
    • bucket: a valid S3 bucket
    • path: a valid S3 key (S3 file path)

Algorithms

Available algorithms are:

  • AES-256-CBC (recommended)
  • AES-192-CBC
  • AES-128-CBC
  • AES-256-ECB
  • AES-192-ECB
  • AES-128-ECB

AES-256-CBC is the recommended algorithm. The other algorithms are prepared for compatibility with other components (see below "Decrypting data" section).

Generating key and iv

Using standard PBKDF2 Password-based Encryption algorithm

PBKDF2 is a standard (PKCS #5) algorithm to generate key and iv from a password.

To generate it, you can use genkey.rb script.

You save above text as "genkey.rb", and run it as following:

$ ruby genkey.rb AES-256-CBC "my-pass-wo-rd"

It shows key and iv as following:

key=D0867C9310D061F17ACD11EB30DE68265DCB79849BE5FB2BE157919D19BF2F42
iv =2A1D6BD59D2DB50A59364BAD3B9B6544

Using openssl EVP_BytesToKey algorithm

You can use openssl EVP_BytesToKey algorithm to generate key and iv from a password. If you use AES-256-CBC cipher algorithm, you type following command:

$ echo secret | openssl enc -aes-256-cbc -a -nosalt -p

You will be asked to enter password. Then it shows key and iv:

key=DAFFED346E29C5654F54133D1FC65CCB5930071ACEAF5B64A22A11406F467DC9
iv =C92D28D70B4440DA3F0F05577ECFEE54
6aEGvMrGx7tODkPF7x5Yog==

You can copy key and iv to key_hex and iv_hex parameters.

Decrypting data

openssl command

You can use openssl command as following:

$ echo <encrypted value> | openssl enc -d -base64 | openssl enc -aes-256-cbc -d -K <key> -iv <iv>

For example:

$ echo 6aEGvMrGx7tODkPF7x5Yog== | openssl enc -d -base64 | openssl enc -aes-256-cbc -d -K DAFFED346E29C5654F54133D1FC65CCB5930071ACEAF5B64A22A11406F467DC9 -iv C92D28D70B4440DA3F0F05577ECFEE54
secret

PostgreSQL

You can use PostgreSQL's decrypt_iv or decrypt function to decrypt values (provided as pgcrypto extension). If you use CBC,

decrypt_iv(decode(encrypted_column, 'base64'), decode('here_is_key_hex', 'hex'), decode('here_is_iv_hex', 'hex'), 'aes')

If you use ECB,

decrypt(decode(encrypted_column, 'base64'), decode('here_is_key_hex', 'hex'), 'aes')

Example

  • Inline key type
filters:
  - type: encrypt
    algorithm: AES-256-CBC
    column_names: [password, ip]
    key_hex: 098F6BCD4621D373CADE4E832627B4F60A9172716AE6428409885B8B829CCB05
    iv_hex: C9DD4BB33B827EB1FBA1B16A0074D460
    output_encoding: hex
  • S3 key type
filters:
  - type: encrypt
    algorithm: AES-256-CBC
    column_names: [password, ip]
    output_encoding: hex
    key_type: s3
    aws_params:
      region: us-east-2
      access_key: XXXXXXXXXXXXXXXXXXXX
      secret_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      bucket: com.sample.keys
      path: key.aes

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously