Class: NdrPseudonymise::SimplePseudonymisation

Inherits:
Object
  • Object
show all
Defined in:
lib/ndr_pseudonymise/simple_pseudonymisation.rb

Overview

Simple pseudonymisation library, for efficient pseudonymisation of identifiable data, suitable for fuzzy matching

Sample usage: Set up clinical data and demographics clinical_data = … load pdf file … all_demographics = => ‘1234567881’, ‘postcode’ => ‘CB22 3AD’, ‘birthdate’ => ‘1975-10-22’, ‘surname’ => ‘SMITH’, ‘forenames’ => ‘JOHN ROBERT’

# Generate pseudonymised identifiers and encryption keys (pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key) =

NdrPseudonymise::SimplePseudonymisation.generate_keys(salt_id, salt_demog, salt_clinical,
 all_demographics['nhsnumber'], all_demographics['postcode'], all_demographics['birthdate'])

# Emit first 4 values as index demographics emit_index_demographics(pseudo_id1, pseudo_id2, key_bundle, rowid)

# Encrypt all demographics with demog_key emit_encrypted_demographics(rowid, NdrPseudonymise::SimplePseudonymisation.encrypt_data64(demog_key, all_demographics.to_json))

# Encrypt all clinical data with clinical_key emit_encrypted_clinical_data(rowid, NdrPseudonymise::SimplePseudonymisation.encrypt_data(clinical_key, clinical_data))

Class Method Summary collapse

Class Method Details

.data_hash(value, salt) ⇒ Object



87
88
89
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 87

def self.data_hash(value, salt)
  Digest::SHA2.hexdigest(value.to_s + salt.to_s)
end

.decrypt_data(key, data) ⇒ Object



115
116
117
118
119
120
121
122
123
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 115

def self.decrypt_data(key, data)
  unless key =~ /[0-9a-f]{32}/
    raise(ArgumentError, 'Expected key to contain at least 256 bits of hex characters (0-9, a-f)')
  end
  aes = OpenSSL::Cipher.new('AES-256-CBC')
  aes.decrypt
  aes.key = Digest::SHA256.digest(key.chomp)
  (aes.update(data) + aes.final)
end

.decrypt_data64(key, data) ⇒ Object



111
112
113
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 111

def self.decrypt_data64(key, data)
  decrypt_data(key, Base64.strict_decode64(data))
end

.encrypt_data(key, data) ⇒ Object

returns a binary string



101
102
103
104
105
106
107
108
109
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 101

def self.encrypt_data(key, data)
  unless key =~ /[0-9a-f]{32}/
    raise(ArgumentError, 'Expected key to contain at least 256 bits of hex characters (0-9, a-f)')
  end
  aes = OpenSSL::Cipher.new('AES-256-CBC')
  aes.encrypt
  aes.key = Digest::SHA256.digest(key)
  aes.update(data) + aes.final
end

.encrypt_data64(key, data) ⇒ Object

returns a base-64 encoded string



96
97
98
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 96

def self.encrypt_data64(key, data)
  Base64.strict_encode64(encrypt_data(key, data))
end

.generate_keys(salt_id, salt_demog, salt_clinical, nhsnumber, current_postcode, birthdate) ⇒ Object

Generate pseudonymised identifiers and pseudonymisation keys Returns an array of 6 strings:

pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key


33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 33

def self.generate_keys(salt_id, salt_demog, salt_clinical, nhsnumber, current_postcode, birthdate)
  unless nhsnumber.is_a?(String) && nhsnumber =~ /\A([0-9]{10})?\Z/
    raise 'Invalid NHS number'
  end
  unless current_postcode.is_a?(String) && current_postcode =~ /\A[A-Z0-9 ]*\Z/
    raise 'Invalid postcode'
  end
  unless birthdate.is_a?(String) && birthdate =~ /\A([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]|)\Z/
    raise 'Invalid birthdate'
  end
  real_id1 = 'nhsnumber_' + nhsnumber
  # Delete spaces from postcode
  real_id2 = 'birthdate_postcode_' + birthdate + '_' + current_postcode.split(' ').join('')

  pseudo_id1 = data_hash(real_id1, salt_id)
  pseudo_id2 = data_hash(real_id2, salt_id)
  demog_key = random_key
  clinical_key = random_key
  keys = []
  if nhsnumber.length > 0
    keys += [encrypt_data64(real_id1 + salt_demog, demog_key),
             encrypt_data64(real_id1 + salt_clinical, clinical_key)]
  end
  if current_postcode.length > 0 && birthdate.length > 0
    keys += [encrypt_data64(real_id2 + salt_demog, demog_key),
             encrypt_data64(real_id2 + salt_clinical, clinical_key)]
  end
  # TODO: Consider whether it's worth storing something, if keys would otherwise be empty.
  key_bundle = keys.join(' ')
  rowid = random_key
  [pseudo_id1, pseudo_id2, key_bundle, rowid, demog_key, clinical_key]
end

.generate_keys_nhsnumber_demog_only(salt_id, salt_demog, nhsnumber) ⇒ Object

Generate pseudonymised identifiers and pseudonymisation keys for data with only an NHS number (missing patient postcode or DOB), where only the demographics need to be pseudonymised (e.g. prescription data). Returns an array of 3 strings:

pseudo_id1, key_bundle, demog_key


71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 71

def self.generate_keys_nhsnumber_demog_only(salt_id, salt_demog, nhsnumber)
  unless nhsnumber.is_a?(String) && nhsnumber =~ /\A([0-9]{10})?\Z/
    raise 'Invalid NHS number'
  end
  real_id1 = 'nhsnumber_' + nhsnumber

  pseudo_id1 = data_hash(real_id1, salt_id)
  demog_key = random_key
  key_bundle = if nhsnumber.length > 0
                 encrypt_data64(real_id1 + salt_demog, demog_key)
               else
                 ''
               end
  [pseudo_id1, key_bundle, demog_key]
end

.random_keyObject



91
92
93
# File 'lib/ndr_pseudonymise/simple_pseudonymisation.rb', line 91

def self.random_key
  SecureRandom.hex(32) # 32 bytes = 256 bits
end