Class: BioDSL::ReadFastq
- Inherits:
-
Object
- Object
- BioDSL::ReadFastq
- Defined in:
- lib/BioDSL/commands/read_fastq.rb
Overview
Read FASTQ entries from one or more files.
read_fastq
read in sequence entries from FASTQ files. Each sequence entry consists of a sequence name prefixed by a ‘>’ followed by the sequence name on a line of its own, followed by one or my lines of sequence until the next entry or the end of the file. The resulting Biopiece record consists of the following record type:
{:SEQ_NAME=>"test",
:SEQ=>"AGCATCGACTAGCAGCATTT",
:SEQ_LEN=>20}
It is possible to read in pair-end data interleaved by using the input2
option. Thus a read is in turn from input and input2. If the reverse_complement
option is used, then the input2 reads will be reverse-complemented.
Input files may be compressed with gzip og bzip2.
For more about the FASTQ format:
en.wikipedia.org/wiki/Fasta_format
Usage
read_fastq(input: <glob>[, input2: <glob>[, first: <uint>|last: <uint>
[, reverse_complement: <bool>]]])
Options
-
input <glob> - Input file or file glob expression.
-
input2 <glob> - Input file or file glob expression.
-
first <uint> - Only read in the first number of entries.
-
last <uint> - Only read in the last number of entries.
-
reverse_complement: <bool> - Reverse-complements input2 reads.
Examples
To read all FASTQ entries from a file:
BD.new.read_fastq(input: "test.fq").dump.run
To read all FASTQ entries from a gzipped file:
BD.new.read_fastq(input: "test.fq.gz").dump.run
To read in only 10 records from a FASTQ file:
BD.new.read_fastq(input: "test.fq", first: 10).dump.run
To read in the last 10 records from a FASTQ file:
BD.new.read_fastq(input: "test.fq", last: 10).dump.run
To read all FASTQ entries from multiple files:
BD.new.read_fastq(input: "test1.fq,test2.fq").dump.run
To read FASTQ entries from multiple files using a glob expression:
BD.new.read_fastq(input: "*.fq").dump.run
To read FASTQ entries from pair-end data:
BD.new.read_fastq(input: "file1.fq", input2: "file2.fq").dump.run
To read FASTQ entries from pair-end data:
BD.new.read_fastq(input: "file1.fq", input2: "file2.fq").dump.run
To read FASTQ entries from pair-end data and reverse-complement read2:
BD.new.
read_fastq(input: "file1.fq", input2: "file2.fq",
reverse_complement: true)
.dump.run
rubocop: disable ClassLength rubocop: disable Metrics/AbcSize rubocop: disable Metrics/CyclomaticComplexity rubocop: disable Metrics/PerceivedComplexity
Constant Summary collapse
- MAX_TEST =
1_000
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ ReadFastq
constructor
Constructor for ReadFastq.
-
#lmb ⇒ Proc
Return command lambda for ReadFastq.
Constructor Details
#initialize(options) ⇒ ReadFastq
Constructor for ReadFastq.
124 125 126 127 128 129 130 131 132 |
# File 'lib/BioDSL/commands/read_fastq.rb', line 124 def initialize() @options = @encoding = [:encoding] ? [:encoding].to_sym : :auto @pair = [:input2] @buffer = [] @type = nil end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for ReadFastq.
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/BioDSL/commands/read_fastq.rb', line 137 def lmb lambda do |input, output, status| status_init(status, STATS) process_input(input, output) case when @options[:first] && @pair then read_first_pair(output) when @options[:first] then read_first_single(output) when @options[:last] && @pair then read_last_pair(output) when @options[:last] then read_last_single(output) when @pair then read_all_pair(output) else read_all_single(output) end end end |