Class: BioDSL::AnalyzeResidueDistribution
- Inherits:
-
Object
- Object
- BioDSL::AnalyzeResidueDistribution
- Defined in:
- lib/BioDSL/commands/analyze_residue_distribution.rb
Overview
Analyze the residue distribution from sequences in the stream.
analyze_residue_distribution
determines the distribution per position of residues from sequences and output records per observed residue with counts at the different positions. Using the percent
option outputs the count as percentages of observed residues per position.
The records output looks like this:
{:RECORD_TYPE=>"residue distribution",
:V0=>"A",
:V1=>5,
:V2=>0,
:V3=>0,
:V4=>0}
Which are ready for +write_table+. See examples.
Usage
analyze_residue_distribution([percent: <bool>])
Options
-
percent: <bool> - Output distributions in percent (default=false).
Examples
Consider the following entries in the file `test.fna`:
>DNA
AGCT
>RNA
AGCU
>Protein
FLS*
>Gaps
-.~
Now we run the data through the following pipeline and get the resulting table:
BD.new.
read_fasta(input: "test.fna").
analyze_residue_distribution.
grab(select: "residue").
write_table(skip: [:RECORD_TYPE]).
run
A 2 0 0 0
G 0 2 0 0
C 0 0 2 0
T 0 0 0 1
U 0 0 0 1
F 1 0 0 0
L 0 1 0 0
S 0 0 1 0
* 0 0 0 1
- 1 0 0 0
. 0 1 0 0
~ 0 0 1 0
Here we do the same as above, but output percentages instead of absolute counts:
BD.new.
read_fasta(input: "test.fna").
analyze_residue_distribution(percent: true).
grab(select: "residue").
write_table(skip: [:RECORD_TYPE]).
run
A 50 0 0 0
G 0 50 0 0
C 0 0 50 0
T 0 0 0 33
U 0 0 0 33
F 25 0 0 0
L 0 25 0 0
S 0 0 25 0
* 0 0 0 33
- 25 0 0 0
. 0 25 0 0
~ 0 0 25 0
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ AnalyzeResidueDistribution
constructor
Constructor for the AnalyzeResidueDistribution class.
-
#lmb ⇒ Proc
Return a lambda for the read_fasta command.
Constructor Details
#initialize(options) ⇒ AnalyzeResidueDistribution
Constructor for the AnalyzeResidueDistribution class.
123 124 125 126 127 128 129 130 131 |
# File 'lib/BioDSL/commands/analyze_residue_distribution.rb', line 123 def initialize() @options = @counts = Hash.new { |h, k| h[k] = Hash.new(0) } @total = Hash.new(0) @residues = Set.new end |
Instance Method Details
#lmb ⇒ Proc
Return a lambda for the read_fasta command.
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/BioDSL/commands/analyze_residue_distribution.rb', line 136 def lmb require 'set' lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 analyze_residues(record[:SEQ]) if record[:SEQ] if output output << record @status[:records_out] += 1 end end calc_dist(output) end end |