Class: BioDSL::TrimSeq
- Inherits:
-
Object
- Object
- BioDSL::TrimSeq
- Defined in:
- lib/BioDSL/commands/trim_seq.rb
Overview
Trim sequence ends removing residues with a low quality score.
trim_seq
removes subquality residues from the ends of sequences in the stream based on quality SCORES in a FASTQ type quality score string. Trimming progresses until a stretch, specified with the length_min
option, is found thus preventing premature termination of the trimming by e.g. a single good quality residue at the end. It is possible, using the mode
option to indicate if the sequence should be trimmed from the left or right end or both (default=:both).
Usage
trim_seq([quality_min: <uint>[, length_min: <uint>
[, mode: <:left|:right|:both>]]])
Options
-
quality_min: <uint> - Minimum quality (default=20).
-
length_min: <uint> - Minimum stretch length (default=3).
-
mode: <string> - Trim mode :left|:right|:both (default=:both).
Examples
Consider the following FASTQ entry in the file test.fq:
@test
gatcgatcgtacgagcagcatctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcat
+
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJI
To trim both ends simply do:
BD.new.read_fastq(input: "test.fq").trim_seq.trim_seq.run
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcat
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJI
---
Use the quality_min
option to change the minimum value to discard:
BD.new.
read_fastq(input: "test.fq").
trim_seq(quality_min: 25).
trim_seq.
run
SEQ_NAME: test
SEQ: cgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 57
SCORES: YZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---
To trim the left end only (use :rigth for right end only), do:
BD.new.read_fastq(input: "test.fq").trim_seq(mode: :left).trim_seq.run
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtctacgacgagcatgctagctag
SEQ_LEN: 62
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDChhh
---
To increase the length of stretch of good quality residues to match, use the length_min
option:
BD.new.read_fastq(input: "test.fq").trim_seq(length_min: 4).trim_seq.run
SEQ_NAME: test
SEQ: tctgacgtatcgatcgttgattagttgctagctatgcagtct
SEQ_LEN: 42
SCORES: TUVWXYZ[\]^_`abcdefghhgfedcba`_^]\[ZYXWVUT
---
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ Proc, TrimSeq
constructor
Constructor for the TrimSeq class.
-
#lmb ⇒ Proc
Return a lambda for the trim_seq command.
Constructor Details
#initialize(options) ⇒ Proc, TrimSeq
Constructor for the TrimSeq class.
123 124 125 126 127 128 129 130 131 132 |
# File 'lib/BioDSL/commands/trim_seq.rb', line 123 def initialize() @options = defaults @mode = @options[:mode].to_sym @min = @options[:quality_min] @len = @options[:length_min] end |
Instance Method Details
#lmb ⇒ Proc
Return a lambda for the trim_seq command.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/BioDSL/commands/trim_seq.rb', line 137 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 trim_seq(record) if record[:SEQ] && record[:SCORES] output << record @status[:records_out] += 1 end end end |