determinations_comparison

Background

This is a tool to compare the output of 2 processed determinations files as long as they have the same inputs (batch and assay).

One of these files is considered the 'baseline' and the other is the 'undertest' file.

The 2 files can be:

  • a .po file processed by Rcompute and it's corresponding .json (compounds) file processed by CPPcompute or
  • a .json (compounds) file processed by one version of CPPcompute and the same .json file processed by a subsequent version of CPPcompute.

PreRequisites

note: The CompareDeterminations.py library is necessary to generate plotted graphs and the CppLogReader.py is necessary to view peak-picking decision by CPPcompute. The tool can be used without either, but it's way cooler if they are included. Both can be installed from here: https://github.com/indigo-biosystems/implementation-tools/tree/master/python_tools/indigo.

Installation

  1. add following to your Gemspec file ```ruby source "https://rubygems.org"

gem 'determinations_comparison'

2. run bundle install (type ```bundle install```)

##Usage

```ruby
require 'determinations_comparison'

baseline_file = 'path/to/baseline/file.po'
undertest_file = 'path/to/undertest/file.json'

cmp = DeterminationsComparison::Comparison.new baseline_file, undertest_file

h = cmp.as_hash # => hash of comparisons

cmp.to_html 'path/to/output/folder'  # => html report file (and supporting files)

# if an html report is generated, then the files are copied to the specified output folder.  The file paths are accessible from these properties.
cmp.html_filepath # => location of html report
cmp.png_filepath # => location of PNG plot file
cmp.baseline_filepath # => location of baseline file (copied to output folder)
cmp.undertest_filepath # => location of undertest file (copied to output folder)

# a summary categorizing worst discrepancy within determination (if any)
cmp.hashCategory # =>  e.g. {:desc=>'difference with apex_intensity',:per_diff=>5}

Special Options

The hashCategory property uses pre-set thresholds of 1% to determine whether a variation is considered a discrepancy. However, these thresholds can be over-ridden by passing in an opt parameter like so:

baseline_file = 'path/to/baseline/file.po'
undertest_file = 'path/to/undertest/file.json'
opts = Hash.new
opts[:hashPropertyThresholds] = {:area=>3,:apex_time=>1,:apex_intensity=>5}
cmp = DeterminationsComparison::Comparison.new baseline_filepath , undertest_filepath, opts}

cmp.hashCategory # this will use thresholds described by hashPropertyThresholds parameter above

If you have ComputeDeterminations.py and/or CppLogReader.py libraries installed (see prerequisites), then Plot charts and pertinent peak-peaking information will be included in the HTML report.

However, these must be within your system path. If they are not, you can specify the locations using the optional opts parameter like so:

baseline_file = 'path/to/baseline/file.po'
undertest_file = 'path/to/undertest/file.json'
opts = Hash.new
opts[:filepath_comparedeterminations] = '/path/to/CompareDeterminations.py'
opts[:filepath_cpplogreader] = '/path/to/CppLogReader.py'

cmp = DeterminationsComparison::Comparison.new baseline_filepath , undertest_filepath, opts}

The CppLogReader.py uses the log.txt to retrieve pertinent peak-picking information.

The assumption that the log.txt file exists in same folder as the 'undertest_filepath'. However, if this is not so, the folder containing log.txt can be specified by another opts parameter:

baseline_file = 'path/to/baseline/file.po'
undertest_file = 'path/to/undertest/file.json'
opts = Hash.new
opts[:folderpath_with_log] = '/path/to/folder/containing/logfile'

cmp = DeterminationsComparison::Comparison.new baseline_filepath , undertest_filepath, opts}

Key Mapping File (key_mapping.yaml)

Assumption is made that all keys within .po files follow lower camel-case naming convention (e.g. myKey) and that keys within .json (compound) files follow snake-case naming convention and therefore a simple conversion from one key name is another is possible. However, there are cases where the naming conventions are not followed.

These cases are accounted for in the config/key_mapping.yaml file (which you can modify).

The keys in this file are grouped by the parent item that they belong to.

For example, the signal-to-noise-ratio key for chromatograms is documented like so:

Chromatogram:
- json_key: smooth_signal_to_noise_ratio
  po_key: SNR

Unless specified within the key_mapping.yaml file, each pair of key-values will be compared using the simple formula ( (a-b)/b or (b-a)/a ).

In some cases, it is more accurate to compare based on a proportionate scale. If this is the case, then this scale is indicated in the key_mapping.yaml file.

Peak:
- json_key: start_intensity
  po_key: startIntensity
  scale: 'relative_to_apex_intensity'

There are 2 options for scale:

  1. relative_to_peak_start_and_end_time
  2. relative_to_apex_intensity