Class: BioDSL::TrimPrimer
- Inherits:
-
Object
- Object
- BioDSL::TrimPrimer
- Defined in:
- lib/BioDSL/commands/trim_primer.rb
Overview
Trim sequence ends in the stream matching a specified primer.
trim_primer
can trim full or partial primer sequence from sequence ends. This is done by matching the primer at the end specified by the direction
option:
Forward clip:
sequence ATCGACTGCATCACGACG
primer CATGAATCGA
result CTGCATCACGACG
Reverse clip:
sequence ATCGACTGCATCACGACG
primer GACGATAGCA
result ATCGACTGCATCAC
The primer sequence can be reverse complemented using the reverse_complement
option. Also, a minimum overlap for trimming can be specified using the overlap_min
option (default=1).
Non-perfect matching can be allowed by setting the allowed mismatch_percent
, insertion_percent
and deletion_percent
.
The following keys are added to clipped records:
-
TRIM_PRIMER_DIR - Direction of clip.
-
TRIM_PRIMER_POS - Sequence position of clip (0 based).
-
TRIM_PRIMER_LEN - Length of clip match.
-
TRIM_PRIMER_PAT - Clip match pattern.
Usage
trim_primer(<primer: <string>>, <direction: <:forward|:reverse>
[, reverse_complement: <bool>[, overlap_min: <uint>
[, mismatch_percent: <uint>
[, insertion_percent: <uint>
[, deletion_percent: <uint>]]]]])
Options
-
primer: <string> - Primer sequence to search for.
-
direction: <:forward|:reverse> - Clip direction.
-
reverse_complement: <bool> - Reverse complement primer (default=false)
-
overlap_min: <uint> - Minimum primer length used (default=1)
-
mismatch_percent: <unit> - Allowed percent mismatches (default=0)
-
insertion_percent: <unit> - Allowed percent insertions (default=0)
-
deletion_percent: <unit> - Allowed percent mismatches (default=0)
Examples
Consider the following FASTA entry in the file test.fna:
>test
ACTGACTGATGACTACGACTACGACTACTACTACGT
The forward end can be trimmed like this:
BD.new.
read_fasta(input: "test.fna").
trim_primer(primer: "ATAGAACTGAC", direction: :forward).
dump.
run
{:SEQ_NAME=>"test",
:SEQ=>"TGATGACTACGACTACGACTACTACTACGT",
:SEQ_LEN=>30,
:TRIM_PRIMER_DIR=>"FORWARD",
:TRIM_PRIMER_POS=>0,
:TRIM_PRIMER_LEN=>6,
:TRIM_PRIMER_PAT=>"ACTGAC"}
And trimming a reverse primer:
BD.new.
read_fasta(input: "test.fna").
trim_primer(primer: "ACTACGTGCGGAT", direction: :reverse).
dump.
run
{:SEQ_NAME=>"test",
:SEQ=>"ACTGACTGATGACTACGACTACGACTACT",
:SEQ_LEN=>29,
:TRIM_PRIMER_DIR=>"REVERSE",
:TRIM_PRIMER_POS=>29,
:TRIM_PRIMER_LEN=>7,
:TRIM_PRIMER_PAT=>"ACTACGT"}
rubocop: disable ClassLength
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out pattern_hits pattern_misses residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ TrimPrimer
constructor
Constructor for TrimPrimer.
-
#lmb ⇒ Proc
Return command lambda for trim_primer.
Constructor Details
#initialize(options) ⇒ TrimPrimer
Constructor for TrimPrimer.
132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/BioDSL/commands/trim_primer.rb', line 132 def initialize() @options = @options[:overlap_min] ||= 1 @options[:mismatch_percent] ||= 0 @options[:insertion_percent] ||= 0 @options[:deletion_percent] ||= 0 @pattern = pattern @hit = false end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for trim_primer.
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/BioDSL/commands/trim_primer.rb', line 147 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 if record[:SEQ] && record[:SEQ].length > 0 @status[:sequences_in] += 1 @status[:sequences_out] += 1 case @options[:direction] when :forward then trim_forward(record) when :reverse then trim_reverse(record) end end output << record @status[:records_out] += 1 end end end |