Class: BioDSL::Uclust

Inherits:
Object
  • Object
show all
Includes:
AuxHelper
Defined in:
lib/BioDSL/commands/uclust.rb

Overview

Run uclust on sequences in the stream.

This is a wrapper for the usearch tool to run the program uclust. Basically sequence type records are clustered de-novo and records containing sequence and cluster information is output. If the align option is given the sequnces will be aligned.

Please refer to the manual:

www.drive5.com/usearch/manual/cmd_cluster_smallmem.html

Usearch 7.0 must be installed for usearch to work. Read more here:

www.drive5.com/usearch/

Usage

uclust(<identity: float>, <strand: "plus|both">[, align: <bool>
       [, cpus: <uint>]])

Options

  • identity: <float> - Similarity for matching in percent between 0.0 and

    1.0.
    
  • strand: <string> - For nucleotide search report hits from plus or both

    strands.
    
  • align: <bool> - Align sequences.

  • cpus: <uint> - Number of CPU cores to use (default=1).

Examples

rubocop: disable ClassLength

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out clusters_out)

Instance Method Summary collapse

Methods included from AuxHelper

#aux_exist

Constructor Details

#initialize(options) ⇒ Uclust

Constructor for Uclust.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :identity (Float)
  • :strand (String, Symbol)
  • :align (Boolean)
  • :cpus (Integer)


78
79
80
81
82
83
84
# File 'lib/BioDSL/commands/uclust.rb', line 78

def initialize(options)
  @options = options
  @options[:cpus] ||= 1

  aux_exist('usearch')
  check_options
end

Instance Method Details

#lmbProc

Return command lambda for uclust.

Returns:

  • (Proc)

    Command lambda.



89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/BioDSL/commands/uclust.rb', line 89

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    TmpDir.create('rec', 'in', 'out') do |tmp_rec, tmp_in, tmp_out|
      process_input(input, output, tmp_rec, tmp_in)

      run_uclust(tmp_in, tmp_out)

      if @options[:align]
        process_output_align(output, tmp_out)
      else
        process_output(output, tmp_rec, tmp_out)
      end
    end
  end
end