Why ncs_mdes tells you the things it tells you

ncs_mdes derives its view of the NCS Master Data Element Specification primarily from the the XML Schema defining the Vanguard Data Repository submission format. However, that file does not contain the full semantics that the gem exposes. This document discusses how the remaining attributes are derived.

Gem overview

ncs_mdes exposes data in three major categories:

  • Tables
  • Types
  • Disposition codes

Types are fairly simple, and are mostly interesting insofar as they are the mechanism whereby you can look up a code list. Disposition codes are extracted from the Master Data Element Specification spreadsheet instead of the VDR schema — unlike the tables and types, they are pre-processed rather than coming from the source document at runtime — but are otherwise simple. This document is mainly concerned with tables and their children, variables.

Tables

The table name attribute is taken directly from the VDR schema.

Instrument or operational?

ncs_mdes can also tell you if a table is an operational or instrument table (this is an XOR relationship) and, if it is an instrument table, whether it is a "primary" instrument table.

Definitions:

  • An operational table is a table that collects study execution information.

  • An instrument table is a table that contains data collected about a study participant.

  • A "primary" instrument table is a table for which there is exactly one record for each time the instrument is collected for a participant. (The MDES is a relational model; non-primary tables contain the results of repeating instrument sections or multivalued questions and are always associated with a primary table, though sometimes the association is indirect.)

These distinctions are derived using the following heuristic:

  • If the table contains a variable named instrument_version and is not the table named instrument, it is a primary instrument table (and therefore an instrument table). (The table instrument is itself an operational table since it records the execution of an instrument rather than any of the data collected in the instrument.)

  • If the table contains a foreign key to a table which is an instrument table, then it is an instrument table.

  • Otherwise, the table is an operational table.

This heuristic works in all cases for MDES 2.0.

Variables

The following attributes of a variable are taken directly from the XML schema:

  • name
  • pii?
  • required?
  • omittable?
  • nillable?
  • status (active, etc.)
  • type

Table references

ncs_mdes can also tell you if a variable is a foreign key reference and if so, to which table it refers. While the XML schema indicates that a variable is of one of a couple of foreign key types, it does not indicate the associated table. That information is derived using the following heuristic:

  • If the variable is not of foreign key type, it's not a foreign key.

  • Otherwise, find all the tables in the MDES whose primary key is named the same as the candidate foreign key variable.

  • If there is exactly one such table, the variable refers to that table.

  • Otherwise fail.

This heuristic does not fail for 399 of the foreign keys in MDES 2.0. Another 155 are mapped manually for a total of 554.

There are also three variables which are typed as foreign keys in the XML schema but which for a couple of different reasons are not treated as foreign keys by ncs_mdes. These are described in comments in documents/2.0/heuristic_overrides.yml in the ncs_mdes source.

Heuristics not used

Type coercion

The MDES VDR schema considers nearly all variables to strings; usually strings of a set length or conforming to a particular pattern. ncs_mdes does not attempt to infer a stronger type for these.