Class: NdrPseudonymise::SimplePseudonymisation
- Inherits:
-
Object
- Object
- NdrPseudonymise::SimplePseudonymisation
- Defined in:
- lib/ndr_pseudonymise/simple_pseudonymisation.rb
Overview
Simple pseudonymisation library, for efficient pseudonymisation of identifiable data, suitable for fuzzy matching
Sample usage: Set up clinical data and demographics clinical_data = … load pdf file … all_demographics = => ‘1234567881’, ‘postcode’ => ‘CB22 3AD’, ‘birthdate’ => ‘1975-10-22’, ‘surname’ => ‘SMITH’, ‘forenames’ => ‘JOHN ROBERT’
# Generate pseudonymised identifiers and encryption keys (pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key) =
NdrPseudonymise::SimplePseudonymisation.generate_keys(salt_id, salt_demog, salt_clinical,
all_demographics['nhsnumber'], all_demographics['postcode'], all_demographics['birthdate'])
# Emit first 4 values as index demographics emit_index_demographics(pseudo_id1, pseudo_id2, key_bundle, rowid)
# Encrypt all demographics with demog_key emit_encrypted_demographics(rowid, NdrPseudonymise::SimplePseudonymisation.encrypt_data64(demog_key, all_demographics.to_json))
# Encrypt all clinical data with clinical_key emit_encrypted_clinical_data(rowid, NdrPseudonymise::SimplePseudonymisation.encrypt_data(clinical_key, clinical_data))
Class Method Summary collapse
- .data_hash(value, salt) ⇒ Object
- .decrypt_data(key, data) ⇒ Object
- .decrypt_data64(key, data) ⇒ Object
-
.encrypt_data(key, data) ⇒ Object
returns a binary string.
-
.encrypt_data64(key, data) ⇒ Object
returns a base-64 encoded string.
-
.generate_keys(salt_id, salt_demog, salt_clinical, nhsnumber, current_postcode, birthdate) ⇒ Object
Generate pseudonymised identifiers and pseudonymisation keys Returns an array of 6 strings: [pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key].
-
.generate_keys_nhsnumber_demog_only(salt_id, salt_demog, nhsnumber) ⇒ Object
Generate pseudonymised identifiers and pseudonymisation keys for data with only an NHS number (missing patient postcode or DOB), where only the demographics need to be pseudonymised (e.g. prescription data).
- .random_key ⇒ Object
Class Method Details
.data_hash(value, salt) ⇒ Object
87 88 89 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 87 def self.data_hash(value, salt) Digest::SHA2.hexdigest(value.to_s + salt.to_s) end |
.decrypt_data(key, data) ⇒ Object
115 116 117 118 119 120 121 122 123 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 115 def self.decrypt_data(key, data) unless key =~ /[0-9a-f]{32}/ raise(ArgumentError, 'Expected key to contain at least 256 bits of hex characters (0-9, a-f)') end aes = OpenSSL::Cipher.new('AES-256-CBC') aes.decrypt aes.key = Digest::SHA256.digest(key.chomp) (aes.update(data) + aes.final) end |
.decrypt_data64(key, data) ⇒ Object
111 112 113 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 111 def self.decrypt_data64(key, data) decrypt_data(key, Base64.strict_decode64(data)) end |
.encrypt_data(key, data) ⇒ Object
returns a binary string
101 102 103 104 105 106 107 108 109 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 101 def self.encrypt_data(key, data) unless key =~ /[0-9a-f]{32}/ raise(ArgumentError, 'Expected key to contain at least 256 bits of hex characters (0-9, a-f)') end aes = OpenSSL::Cipher.new('AES-256-CBC') aes.encrypt aes.key = Digest::SHA256.digest(key) aes.update(data) + aes.final end |
.encrypt_data64(key, data) ⇒ Object
returns a base-64 encoded string
96 97 98 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 96 def self.encrypt_data64(key, data) Base64.strict_encode64(encrypt_data(key, data)) end |
.generate_keys(salt_id, salt_demog, salt_clinical, nhsnumber, current_postcode, birthdate) ⇒ Object
Generate pseudonymised identifiers and pseudonymisation keys Returns an array of 6 strings:
- pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 33 def self.generate_keys(salt_id, salt_demog, salt_clinical, nhsnumber, current_postcode, birthdate) unless nhsnumber.is_a?(String) && nhsnumber =~ /\A([0-9]{10})?\Z/ raise 'Invalid NHS number' end unless current_postcode.is_a?(String) && current_postcode =~ /\A[A-Z0-9 ]*\Z/ raise 'Invalid postcode' end unless birthdate.is_a?(String) && birthdate =~ /\A([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]|)\Z/ raise 'Invalid birthdate' end real_id1 = 'nhsnumber_' + nhsnumber # Delete spaces from postcode real_id2 = 'birthdate_postcode_' + birthdate + '_' + current_postcode.split(' ').join('') pseudo_id1 = data_hash(real_id1, salt_id) pseudo_id2 = data_hash(real_id2, salt_id) demog_key = random_key clinical_key = random_key keys = [] if nhsnumber.length > 0 keys += [encrypt_data64(real_id1 + salt_demog, demog_key), encrypt_data64(real_id1 + salt_clinical, clinical_key)] end if current_postcode.length > 0 && birthdate.length > 0 keys += [encrypt_data64(real_id2 + salt_demog, demog_key), encrypt_data64(real_id2 + salt_clinical, clinical_key)] end # TODO: Consider whether it's worth storing something, if keys would otherwise be empty. key_bundle = keys.join(' ') rowid = random_key [pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key] end |
.generate_keys_nhsnumber_demog_only(salt_id, salt_demog, nhsnumber) ⇒ Object
Generate pseudonymised identifiers and pseudonymisation keys for data with only an NHS number (missing patient postcode or DOB), where only the demographics need to be pseudonymised (e.g. prescription data). Returns an array of 3 strings:
- pseudo_id1, key_bundle, demog_key
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 71 def self.generate_keys_nhsnumber_demog_only(salt_id, salt_demog, nhsnumber) unless nhsnumber.is_a?(String) && nhsnumber =~ /\A([0-9]{10})?\Z/ raise 'Invalid NHS number' end real_id1 = 'nhsnumber_' + nhsnumber pseudo_id1 = data_hash(real_id1, salt_id) demog_key = random_key key_bundle = if nhsnumber.length > 0 encrypt_data64(real_id1 + salt_demog, demog_key) else '' end [pseudo_id1, key_bundle, demog_key] end |
.random_key ⇒ Object
91 92 93 |
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 91 def self.random_key SecureRandom.hex(32) # 32 bytes = 256 bits end |