Class: BioDSL::SplitPairSeq
- Inherits:
-
Object
- Object
- BioDSL::SplitPairSeq
- Defined in:
- lib/BioDSL/commands/split_pair_seq.rb
Overview
Splite pair-end sequences in the stream.
split_pair_seq splits sequences in the stream previously merged with merge_pair_seq. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. A sequence split into two will be output as two records where the first will be named with 1 and the second with 2.
Usage
split_pair_seq
Options
Examples
Consider the following records created with merge_pair_seq:
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14",
:SEQ=>"TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT",
:SEQ_LEN=>40,
:SCORES=>"<??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT",
:SEQ_LEN=>40,
:SCORES=>"<???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT",
:SEQ_LEN=>40,
:SCORES=>"?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
These can be split using split_pair_seq:
BD.new.
read_fastq(input: "test.fq", encoding: :base_33).
merge_pair_seq.
split_pair_seq.
dump.
run
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14",
:SEQ=>"TGGGGAATATTGGACAATGG",
:SEQ_LEN=>20,
:SCORES=>"<??????BDDDDDDDDGGGG"}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14",
:SEQ=>"CCTGTTTGCTACCCACGCTT",
:SEQ_LEN=>20,
:SCORES=>"?????BB<-<BDDDDDFEEF"}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGG",
:SEQ_LEN=>20,
:SCORES=>"<???9?BBBDBDDBDDFFFF"}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14",
:SEQ=>"ACTCTTCGCTACCCATGCTT",
:SEQ_LEN=>20,
:SCORES=>",5<??BB?DDABDBDDFFFF"}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGG",
:SEQ_LEN=>20,
:SCORES=>"?????BBBBBDDBDDBFFFF"}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14",
:SEQ=>"CCTCTTCGCTACCCATGCTT",
:SEQ_LEN=>20,
:SCORES=>"??,<??B?BB?BBBBBFF?F"}
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ SplitPairSeq
constructor
Constructor for SplitPairSeq.
-
#lmb ⇒ Proc
Return command lambda for split_pair_seq.
Constructor Details
#initialize(options) ⇒ SplitPairSeq
Constructor for SplitPairSeq.
108 109 110 111 112 |
# File 'lib/BioDSL/commands/split_pair_seq.rb', line 108 def initialize() @options = end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for split_pair_seq.
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/BioDSL/commands/split_pair_seq.rb', line 117 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 if record[:SEQ_NAME] && record[:SEQ] && record[:SEQ_LEN_LEFT] && record[:SEQ_LEN_RIGHT] split_pair_seq(output, record) else output << record @status[:records_out] += 1 end end end end |