Class: NdrPseudonymise::PrescriptionPseudonymiser

Inherits:
PseudonymisationSpecification show all
Defined in:
lib/ndr_pseudonymise/prescription_pseudonymiser.rb

Overview

Pseudonymise prescription data

Constant Summary collapse

PREAMBLE_V2_DEMOG_ONLY =
'Pseudonymised matching data v2.0-demog-only'.freeze

Constants inherited from PseudonymisationSpecification

NdrPseudonymise::PseudonymisationSpecification::HEADER_ROW_PREFIX, NdrPseudonymise::PseudonymisationSpecification::KEY_BYTES, NdrPseudonymise::PseudonymisationSpecification::PREAMBLE_V1_STRIPED

Instance Method Summary collapse

Methods inherited from PseudonymisationSpecification

#all_demographics, #clinical_data, #core_demographics, #data_hash, #decrypt_data, #decrypt_to_csv, #encrypt_data, factory, get_key_bundle, #header_row?, #pseudo_id, #pseudonymise_csv, #random_key, #real_ids, #row_errors, #safe_json

Constructor Details

#initialize(format_spec, key_bundle) ⇒ PrescriptionPseudonymiser

Returns a new instance of PrescriptionPseudonymiser.



16
17
18
19
20
# File 'lib/ndr_pseudonymise/prescription_pseudonymiser.rb', line 16

def initialize(format_spec, key_bundle)
  super
  return if @format_spec[:demographics] == [0, 1]
  raise 'Invalid specification: expected nhsnumber and birthdate in first 2 columns'
end

Instance Method Details

#csv_header_rowObject

Header row for CSV data



62
63
64
# File 'lib/ndr_pseudonymise/prescription_pseudonymiser.rb', line 62

def csv_header_row
  [PREAMBLE_V2_DEMOG_ONLY]
end

#emit_csv_rows(out_csv, pseudonymised_row) ⇒ Object

Append the output of pseudonymise_row to a CSV file



67
68
69
# File 'lib/ndr_pseudonymise/prescription_pseudonymiser.rb', line 67

def emit_csv_rows(out_csv, pseudonymised_row)
  out_csv << pseudonymised_row[0]
end

#pseudonymise_row(row) ⇒ Object

Pseudonymise a row of prescription data, returning an array of a single row:

[packed_pseudoid_and_demographics, clinical_data1, …]

Where packed_pseudoid_and_demographics consists of “pseudo_id1 (key_bundle) packed_pseudoid_and_demographics”



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/ndr_pseudonymise/prescription_pseudonymiser.rb', line 40

def pseudonymise_row(row)
  @key_cache ||= {} # Cache pseudonymisation keys for more compact import
  all_demographics = { 'nhsnumber' => row[0], 'birthdate' => row[1] }
  key = all_demographics.to_json
  if @key_cache.key?(key)
    pseudo_id1, key_bundle, demog_key = @key_cache[key]
  else
    pseudo_id1, key_bundle, demog_key = NdrPseudonymise::SimplePseudonymisation.
                                        generate_keys_nhsnumber_demog_only(@salt1, @salt2, row[0])
    if !row[0].to_s.empty? && !row[1].to_s.empty? # && false to stop caching
      @key_cache = {} if @key_cache.size > 10000 # Limit cache size
      @key_cache[key] = [pseudo_id1, key_bundle, demog_key]
    end
  end
  encrypted_demographics = NdrPseudonymise::SimplePseudonymisation.
                           encrypt_data64(demog_key, all_demographics.to_json)
  packed_pseudoid_and_demographics = format('%s (%s) %s', pseudo_id1, key_bundle,
                                            encrypted_demographics)
  [[packed_pseudoid_and_demographics] + row[2..-1]]
end

#row_errors2(row) ⇒ Object

Validate a row of prescription data Return false if this row is a valid data row, otherwise a list of errors



24
25
26
27
28
29
30
31
32
33
34
# File 'lib/ndr_pseudonymise/prescription_pseudonymiser.rb', line 24

def row_errors2(row)
  # Not significantly faster than optimised general #row_errors method
  (nhsnumber, birthdate) = row[0..1]
  unless nhsnumber.is_a?(String) && nhsnumber =~ /\A([0-9]{10})?\Z/
    raise 'Invalid NHS number'
  end
  raise 'Missing NHS number' if nhsnumber.size < 10
  unless birthdate.is_a?(String) && birthdate =~ /\A([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]|)\Z/
    raise 'Invalid birthdate'
  end
end