Class: BioDSL::MergePairSeq

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/merge_pair_seq.rb

Overview

Merge pair-end sequences in the stream.

merge_pair_seq merges paired sequences in the stream, if these are interleaved. Sequence names must be in either Illumina1.3/1.5 format trailing a /1 or /2 or Illumina1.8 containing 1: or 2:. Sequence names must match accordingly in order to merge sequences.

Usage

merge_pair_seq

Options

Examples

Consider the following FASTQ entry in the file test.fq:

@M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14
TGGGGAATATTGGACAATGG
+
<??????BDDDDDDDDGGGG
@M01168:16:000000000-A1R9L:1:1101:14862:1868 2:N:0:14
CCTGTTTGCTACCCACGCTT
+
?????BB<-<BDDDDDFEEF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14
TAGGGAATCTTGCACAATGG
+
<???9?BBBDBDDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:13906:2139 2:N:0:14
ACTCTTCGCTACCCATGCTT
+
,5<??BB?DDABDBDDFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14
TAGGGAATCTTGCACAATGG
+
?????BBBBBDDBDDBFFFF
@M01168:16:000000000-A1R9L:1:1101:14865:2158 2:N:0:14
CCTCTTCGCTACCCATGCTT
+
??,<??B?BB?BBBBBFF?F

To merge these interleaved pair-end sequences use merge_pair_seq:

BD.new.
read_fastq(input: "test.fq", encoding: :base_33).
merge_pair_seq.
dump.
run

{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14862:1868 1:N:0:14",
 :SEQ=>"TGGGGAATATTGGACAATGGCCTGTTTGCTACCCACGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"<??????BDDDDDDDDGGGG?????BB<-<BDDDDDFEEF",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:13906:2139 1:N:0:14",
 :SEQ=>"TAGGGAATCTTGCACAATGGACTCTTCGCTACCCATGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"<???9?BBBDBDDBDDFFFF,5<??BB?DDABDBDDFFFF",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}
{:SEQ_NAME=>"M01168:16:000000000-A1R9L:1:1101:14865:2158 1:N:0:14",
 :SEQ=>"TAGGGAATCTTGCACAATGGCCTCTTCGCTACCCATGCTT",
 :SEQ_LEN=>40,
 :SCORES=>"?????BBBBBDDBDDBFFFF??,<??B?BB?BBBBBFF?F",
 :SEQ_LEN_LEFT=>20,
 :SEQ_LEN_RIGHT=>20}

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ MergePairSeq

Constructor for MergePairSeq.

Parameters:

  • options (Hash)

    Options hash.



106
107
108
109
110
# File 'lib/BioDSL/commands/merge_pair_seq.rb', line 106

def initialize(options)
  @options = options

  check_options
end

Instance Method Details

#lmbProc

Return the command lambda for merge_pair_seq.

Returns:

  • (Proc)

    Command lambda for.



115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# File 'lib/BioDSL/commands/merge_pair_seq.rb', line 115

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each_slice(2) do |record1, record2|
      @status[:records_in] += record2 ? 2 : 1

      if record1[:SEQ] && record2[:SEQ]
        output << merge_pair_seq(record1, record2)

        @status[:sequences_in] += 2
        @status[:sequences_out] += 1
        @status[:records_out] += 1
      else
        output.puts record1, record2

        @status[:records_out] += 2
      end
    end
  end
end