Class: BioDSL::MeanScores
- Inherits:
-
Object
- Object
- BioDSL::MeanScores
- Defined in:
- lib/BioDSL/commands/mean_scores.rb
Overview
Calculate the mean or local mean of quality SCORES in the stream.
mean_scores
calculates either the global or local mean value or quality SCORES in the stream. The quality SCORES are encoded Phred style in character string.
The global (default) behaviour calculates the SCORES_MEAN as the sum of all the scores over the length of the SCORES string.
The local means SCORES_MEAN_LOCAL are calculated using means from a sliding window, where the smallest mean is returned.
Thus, subquality records, with either an overall low mean quality or with local dip in quality, can be filtered using grab
.
Usage
mean_scores([local: <bool>[, window_size: <uint>]])
Options
-
local: <bool> - Calculate local mean score (default=false).
-
window_size: <uint> - Size of sliding window (defaul=5).
Examples
Consider the following FASTQ entry in the file test.fq:
@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG
+
BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII
The values of the scores in decimal are:
SCORES: 33;34;35;36;37;38;39;40;40;40;40;40;40;40;11;11;11;11;11;40;37;
37;40;40;40;40;40;40;40;40;40;40;40;
To calculate the mean score do:
BD.new.read_fastq(input: "test.fq").mean_scores.dump.run
{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
:SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG",
:SEQ_LEN=>33,
:SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII",
:SCORES_MEAN=>34.58}
To calculate local means for a sliding window, do:
BD.new.read_fastq(input: "test.fq").mean_scores(local: true).dump.run
{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
:SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG",
:SEQ_LEN=>33,
:SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII",
:SCORES_MEAN_LOCAL=>11.0}
Which indicates a local minimum was located at the stretch of ,,,,, = 11+11+11+11+11 / 5 = 11.0
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out min_mean max_mean mean_mean)
Instance Method Summary collapse
-
#initialize(options) ⇒ MeanScores
constructor
Constructor for MeanScores.
-
#lmb ⇒ Proc
Return command lambda for mean_scores.
Constructor Details
#initialize(options) ⇒ MeanScores
Constructor for MeanScores.
100 101 102 103 104 105 106 107 108 109 |
# File 'lib/BioDSL/commands/mean_scores.rb', line 100 def initialize() @options = @min = Float::INFINITY @max = 0 @sum = 0 @count = 0 defaults end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for mean_scores.
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/BioDSL/commands/mean_scores.rb', line 114 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 calc_mean(record) if record[:SCORES] && record[:SCORES].length > 0 output << record @status[:records_out] += 1 end @status[:mean_mean] = (@sum.to_f / @count).round(2) end end |