Class: BioDSL::AssembleSeqRay

Inherits:
Object
  • Object
show all
Includes:
AuxHelper
Defined in:
lib/BioDSL/commands/assemble_seq_ray.rb

Overview

Assemble sequences the stream using Ray.

assemble_seq_ray is a wrapper around the deBruijn graph assembler Ray:

denovoassembler.sourceforge.net/

Any records containing sequence information will be included in the assembly, but only the assembled contig sequences will be output to the stream.

The sequences records may contain quality scores, and if the sequence names indicates that the sequence order is inter-leaved paired-end assembly will be performed.

Kmer values must be odd.

Usage

assemble_seq_ray([kmer_min: <uint>[, kmer_max: <uint>
                 [, contig_min: <uint>[, cpus: <uint>]]]])

Options

  • kmer_min: <uint> - Minimum k-mer value (default: 21).

  • kmer_max: <uint> - Maximum k-mer value (default: 49).

  • contig_min: <uint> - Minimum contig size (default: 500).

  • cpus: <uint> - Number of CPUs to use (default: 1).

Examples

If you have two pair-end sequence files with the Illumina data then you can assemble these using assemble_seq_ray like this:

BD.new.
read_fastq(input: "file1.fq", input2: "file2.fq).
assemble_seq_ray.
write_fasta(output: "contigs.fna").
run

Defined Under Namespace

Classes: N50

Constant Summary

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out n50 contig_min contig_max kmer)

Instance Method Summary collapse

Methods included from AuxHelper

#aux_exist

Constructor Details

#initialize(options) ⇒ AssembleSeqRay

Constructor for the AssembleSeqRay class.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :kmer_min (Integer)

    Minimum kmer value.

  • :kmer_max (Integer)

    Maximum kmer value.

  • :cpus (Integer)

    CPUs to use.



86
87
88
89
90
91
92
93
94
95
# File 'lib/BioDSL/commands/assemble_seq_ray.rb', line 86

def initialize(options)
  @options = options
  @lengths = []
  @paired  = nil

  aux_exist('Ray')
  aux_exist('mpiexec')
  defaults
  check_options
end

Instance Method Details

#lmbProc

Return a lambda for the AssembleSeqRay command.

Returns:

  • (Proc)

    Returns the command lambda.



100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/BioDSL/commands/assemble_seq_ray.rb', line 100

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    TmpDir.create('reads.fa') do |fa_in, tmp_dir|
      process_input(input, output, fa_in)
      @paired = paired?(fa_in)

      n50s = run_assemblies(fa_in, tmp_dir)

      best_kmer = n50s.sort_by(&:n50).reverse.first.kmer

      process_output(output, tmp_dir, best_kmer)
    end
  end
end