Class: BioDSL::MergePairSeq
- Inherits:
-
Object
- Object
- BioDSL::MergePairSeq
- Defined in:
- lib/BioDSL/commands/merge_pair_seq.rb
Overview
Merge pair-end sequences in the stream.
merge_pair_seq
merges paired sequences in the stream, if these are interleaved. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. Sequence names must match accordingly in order to merge sequences.
Usage
merge_pair_seq
Options
Examples
Consider the following FASTQ entry in the file test.fq:
@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
TGGGGAATATTGGACAATGG
+
<??????BDDDDDDDDGGGG
@M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
CCTGTTTGCTACCCACGCTT
+
?????BB<-<BDDDDDFEEF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
TAGGGAATCTTGCACAATGG
+
<???9?BBBDBDDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
ACTCTTCGCTACCCATGCTT
+
,5<??BB?DDABDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
TAGGGAATCTTGCACAATGG
+
?????BBBBBDDBDDBFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
CCTCTTCGCTACCCATGCTT
+
??,<??B?BB?BBBBBFF?F
To merge these interleaved pair-end sequences use merge_pair_seq:
BD.new.
read_fastq(input: "test.fq", encoding: :base_33).
merge_pair_seq.
dump.
run
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14",
:SEQ=>"TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT",
:SEQ_LEN=>40,
:SCORES=>"<??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT",
:SEQ_LEN=>40,
:SCORES=>"<???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14",
:SEQ=>"TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT",
:SEQ_LEN=>40,
:SCORES=>"?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F",
:SEQ_LEN_LEFT=>20,
:SEQ_LEN_RIGHT=>20}
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ MergePairSeq
constructor
Constructor for MergePairSeq.
-
#lmb ⇒ Proc
Return the command lambda for merge_pair_seq.
Constructor Details
#initialize(options) ⇒ MergePairSeq
Constructor for MergePairSeq.
106 107 108 109 110 |
# File 'lib/BioDSL/commands/merge_pair_seq.rb', line 106 def initialize() @options = end |
Instance Method Details
#lmb ⇒ Proc
Return the command lambda for merge_pair_seq.
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/BioDSL/commands/merge_pair_seq.rb', line 115 def lmb lambda do |input, output, status| status_init(status, STATS) input.each_slice(2) do |record1, record2| @status[:records_in] += record2 ? 2 : 1 if record1[:SEQ] && record2[:SEQ] output << merge_pair_seq(record1, record2) @status[:sequences_in] += 2 @status[:sequences_out] += 1 @status[:records_out] += 1 else output.puts record1, record2 @status[:records_out] += 2 end end end end |