Class: BioDSL::PlotScores

Inherits:
Object
  • Object
show all
Includes:
AuxHelper
Defined in:
lib/BioDSL/commands/plot_scores.rb

Overview

Create a histogram with mean sequence quality scores.

plot_scores creates a histogram of the mean values per base of the quality scores from sequence data.

Plotting is done using GNUplot which allows for different types of output the default one being crufty ASCII graphics.

If plotting scores from sequences of variable length you can use the count option to co-plot the relative count at each base position. This allow you to detect areas with a low relative count showing a high mean score.

GNUplot must be installed for plot_scores to work. Read more here:

www.gnuplot.info/

Usage

plot_scores([count: <bool>[, output: <file>[, force: <bool>
            [, terminal: <string>[, title: <string>
            [, xlabel: <string>[, ylabel: <string>
            [, test: <bool>]]]]]]]])

Options

  • count: <bool> - Add line plot of relative counts.

  • output: <file> - Output file.

  • force: <bool> - Force overwrite existing output file.

  • terminal: <string> - Terminal for output: dumb|post|svg|x11|aqua|png|pdf

    (default=dumb).
    
  • title: <string> - Plot title (default=“Histogram”).

  • xlabel: <string> - X-axis label (default=<key>).

  • ylabel: <string> - Y-axis label (default=“n”).

  • test: <bool> - Output Gnuplot script instread of plot.

Examples

Here we plot the mean quality scores from a FASTQ file:

read_fastq(input: "test.fq").plot_scores.run

                             Mean Quality Scores
    +             +            +             +             +            +
40 ++-------------+------------+-------------+-------------+------------+++
    |  *****************                               mean score ****** |
35 ++ ***********************                                            ++
    ****************************** **                                    |
30 +*********************************   *                                ++
    ************************************* *                              |
25 +*************************************** *                            ++
    ****************************************** *****                     |
20 +****************************************************  ** * *         ++
    ******************************************************************** *
15 +**********************************************************************+
    **********************************************************************
10 +**********************************************************************+
    **********************************************************************
 5 +**********************************************************************+
    **********************************************************************
 0 +**********************************************************************+
    +             +            +             +             +            +
    0             50          100           150           200          250
                              Sequence position

To render X11 output (i.e. instant view) use the terminal option:

read_fastq(input: "test.fq").
plot_scores(terminal: :x11).run

To generate a PNG image and save to file:

read_fastq(input: "test.fq").
plot_scores(terminal: :png, output: "plot.png").run

rubocop: enable LineLength rubocop: disable ClassLength

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)
SCORES_MAX =

Maximum score string length.

100_000

Instance Method Summary collapse

Methods included from AuxHelper

#aux_exist

Constructor Details

#initialize(options) ⇒ PlotScores

Constructor for PlotScores.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :count (Boolean)
  • :output (String)
  • :force (Boolean)
  • :terminal (Symbol)
  • :title (String)
  • :xlabel (String)
  • :ylabel (String)
  • :ylogscale (Boolean)
  • :test (Boolean)


132
133
134
135
136
137
138
139
140
141
# File 'lib/BioDSL/commands/plot_scores.rb', line 132

def initialize(options)
  @options    = options
  @scores_vec = NArray.int(SCORES_MAX)
  @count_vec  = NArray.int(SCORES_MAX)
  @max        = 0

  aux_exist('gnuplot')
  check_options
  default
end

Instance Method Details

#lmbProc

Return command lambda for plot_scores.

Returns:

  • (Proc)

    Command lambda.



146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/BioDSL/commands/plot_scores.rb', line 146

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      collect_plot_data(record)

      write_output(output, record)
    end

    prepare_plot_data

    plot_defaults
    plot_scores
    plot_count
    plot_output
  end
end