Class: BioDSL::MeanScores

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/mean_scores.rb

Overview

Calculate the mean or local mean of quality SCORES in the stream.

mean_scores calculates either the global or local mean value or quality SCORES in the stream. The quality SCORES are encoded Phred style in character string.

The global (default) behaviour calculates the SCORES_MEAN as the sum of all the scores over the length of the SCORES string.

The local means SCORES_MEAN_LOCAL are calculated using means from a sliding window, where the smallest mean is returned.

Thus, subquality records, with either an overall low mean quality or with local dip in quality, can be filtered using grab.

Usage

mean_scores([local: <bool>[, window_size: <uint>]])

Options

  • local: <bool> - Calculate local mean score (default=false).

  • window_size: <uint> - Size of sliding window (defaul=5).

Examples

Consider the following FASTQ entry in the file test.fq:

@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG
+
BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII

The values of the scores in decimal are:

SCORES: 33;34;35;36;37;38;39;40;40;40;40;40;40;40;11;11;11;11;11;40;37;
        37;40;40;40;40;40;40;40;40;40;40;40;

To calculate the mean score do:

BD.new.read_fastq(input: "test.fq").mean_scores.dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG",
 :SEQ_LEN=>33,
 :SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII",
 :SCORES_MEAN=>34.58}

To calculate local means for a sliding window, do:

BD.new.read_fastq(input: "test.fq").mean_scores(local: true).dump.run

{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
 :SEQ=>"TTGGTCGCTCGCTCGACCTCAGATCAGACGTGG",
 :SEQ_LEN=>33,
 :SCORES=>"BCDEFGHIIIIIII,,,,,IFFIIIIIIIIIII",
 :SCORES_MEAN_LOCAL=>11.0}

Which indicates a local minimum was located at the stretch of ,,,,, = 11+11+11+11+11 / 5 = 11.0

Constant Summary

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out min_mean max_mean mean_mean)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ MeanScores

Constructor for MeanScores.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :local (Boolean)
  • :window_size (Fixnum)


100
101
102
103
104
105
106
107
108
109
# File 'lib/BioDSL/commands/mean_scores.rb', line 100

def initialize(options)
  @options = options
  @min     = Float::INFINITY
  @max     = 0
  @sum     = 0
  @count   = 0

  check_options
  defaults
end

Instance Method Details

#lmbProc

Return command lambda for mean_scores.

Returns:

  • (Proc)

    Command lambda.



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/BioDSL/commands/mean_scores.rb', line 114

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      calc_mean(record) if record[:SCORES] && record[:SCORES].length > 0

      output << record

      @status[:records_out] += 1
    end

    @status[:mean_mean] = (@sum.to_f / @count).round(2)
  end
end