Class: BioDSL::ReadFastq

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/read_fastq.rb

Overview

Read FASTQ entries from one or more files.

read_fastq read in sequence entries from FASTQ files. Each sequence entry consists of a sequence name prefixed by a ‘>’ followed by the sequence name on a line of its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting Biopiece record consists of the following record type:

{:SEQ_NAME=>"test",
 :SEQ=>"AGCATCGACTAGCAGCATTT",
 :SEQ_LEN=>20}

It is possible to read in pair-end data interleaved by using the input2 option. Thus a read is in turn from input and input2. If the reverse_complement option is used, then the input2 reads will be reverse-complemented.

Input files may be compressed with gzip og bzip2.

For more about the FASTQ format:

en.wikipedia.org/wiki/Fasta_format

Usage

read_fastq(input: <glob>[, input2: <glob>[, first: <uint>|last: <uint>
           [, reverse_complement: <bool>]]])

Options

  • input <glob> - Input file or file glob expression.

  • input2 <glob> - Input file or file glob expression.

  • first <uint> - Only read in the first number of entries.

  • last <uint> - Only read in the last number of entries.

  • reverse_complement: <bool> - Reverse-complements input2 reads.

Examples

To read all FASTQ entries from a file:

BD.new.read_fastq(input: "test.fq").dump.run

To read all FASTQ entries from a gzipped file:

BD.new.read_fastq(input: "test.fq.gz").dump.run

To read in only 10 records from a FASTQ file:

BD.new.read_fastq(input: "test.fq", first: 10).dump.run

To read in the last 10 records from a FASTQ file:

BD.new.read_fastq(input: "test.fq", last: 10).dump.run

To read all FASTQ entries from multiple files:

BD.new.read_fastq(input: "test1.fq,test2.fq").dump.run

To read FASTQ entries from multiple files using a glob expression:

BD.new.read_fastq(input: "*.fq").dump.run

To read FASTQ entries from pair-end data:

BD.new.read_fastq(input: "file1.fq", input2: "file2.fq").dump.run

To read FASTQ entries from pair-end data:

BD.new.read_fastq(input: "file1.fq", input2: "file2.fq").dump.run

To read FASTQ entries from pair-end data and reverse-complement read2:

BD.new.
read_fastq(input: "file1.fq", input2: "file2.fq",
           reverse_complement: true)
.dump.run

rubocop: disable ClassLength rubocop: disable Metrics/AbcSize rubocop: disable Metrics/CyclomaticComplexity rubocop: disable Metrics/PerceivedComplexity

Constant Summary collapse

MAX_TEST =
1_000
STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ ReadFastq

Constructor for ReadFastq.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :encoding (Symbol, String)
  • :input (String)
  • :input2 (String)
  • :first (Integer)
  • :last (Integer)
  • :reverse_complement (Boolean)


124
125
126
127
128
129
130
131
132
# File 'lib/BioDSL/commands/read_fastq.rb', line 124

def initialize(options)
  @options  = options
  @encoding = options[:encoding] ? options[:encoding].to_sym : :auto
  @pair     = options[:input2]
  @buffer   = []
  @type     = nil

  check_options
end

Instance Method Details

#lmbProc

Return command lambda for ReadFastq.

Returns:

  • (Proc)

    Command lambda.



137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/BioDSL/commands/read_fastq.rb', line 137

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    process_input(input, output)

    case
    when @options[:first] && @pair then read_first_pair(output)
    when @options[:first]          then read_first_single(output)
    when @options[:last] && @pair  then read_last_pair(output)
    when @options[:last]           then read_last_single(output)
    when @pair                     then read_all_pair(output)
    else
      read_all_single(output)
    end
  end
end