Class: BioDSL::AnalyzeResidueDistribution

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/analyze_residue_distribution.rb

Overview

Analyze the residue distribution from sequences in the stream.

analyze_residue_distribution determines the distribution per position of residues from sequences and output records per observed residue with counts at the different positions. Using the percent option outputs the count as percentages of observed residues per position.

The records output looks like this:

   {:RECORD_TYPE=>"residue distribution",
    :V0=>"A",
    :V1=>5,
    :V2=>0,
    :V3=>0,
    :V4=>0}

Which are ready for +write_table+. See examples.

Usage

analyze_residue_distribution([percent: <bool>])

Options

  • percent: <bool> - Output distributions in percent (default=false).

Examples

Consider the following entries in the file `test.fna`:

>DNA
AGCT
>RNA
AGCU
>Protein
FLS*
>Gaps
-.~

Now we run the data through the following pipeline and get the resulting table:

BD.new.
read_fasta(input: "test.fna").
analyze_residue_distribution.
grab(select: "residue").
write_table(skip: [:RECORD_TYPE]).
run

A 2 0 0 0
G 0 2 0 0
C 0 0 2 0
T 0 0 0 1
U 0 0 0 1
F 1 0 0 0
L 0 1 0 0
S 0 0 1 0
* 0 0 0 1
- 1 0 0 0
. 0 1 0 0
~ 0 0 1 0

Here we do the same as above, but output percentages instead of absolute counts:

BD.new.
read_fasta(input: "test.fna").
analyze_residue_distribution(percent: true).
grab(select: "residue").
write_table(skip: [:RECORD_TYPE]).
run

A 50  0 0 0
G 0 50  0 0
C 0 0 50  0
T 0 0 0 33
U 0 0 0 33
F 25  0 0 0
L 0 25  0 0
S 0 0 25  0
* 0 0 0 33
- 25  0 0 0
. 0 25  0 0
~ 0 0 25  0

Constant Summary

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ AnalyzeResidueDistribution

Constructor for the AnalyzeResidueDistribution class.

Options Hash (options):

  • :percent (Boolean)

    Output distribution in percent.



123
124
125
126
127
128
129
130
131
# File 'lib/BioDSL/commands/analyze_residue_distribution.rb', line 123

def initialize(options)
  @options = options

  check_options

  @counts        = Hash.new { |h, k| h[k] = Hash.new(0) }
  @total         = Hash.new(0)
  @residues      = Set.new
end

Instance Method Details

#lmbProc

Return a lambda for the read_fasta command.



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/BioDSL/commands/analyze_residue_distribution.rb', line 136

def lmb
  require 'set'

  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      analyze_residues(record[:SEQ]) if record[:SEQ]

      if output
        output << record
        @status[:records_out] += 1
      end
    end

    calc_dist(output)
  end
end