Class: BioDSL::ClusterOtus

Inherits:
Object
  • Object
show all
Includes:
AuxHelper
Defined in:
lib/BioDSL/commands/cluster_otus.rb

Overview

Create OTUs from sequences in the stream.

Use the usearch program cluster_otus to cluster sequences in the stream and output a representative sequence from each cluster. Sequences must be dereplicated and sorted according to SEQ_COUNT in decreasing order.

Please refer to the manual:

drive5.com/usearch/manual/cluster_otus.html

Usearch 7.0 must be installed for usearch to work. Read more here:

www.drive5.com/usearch/

Usage

cluster_otus([identity: <float>])

Options

* identity: <float> - OTU cluster identity between 0.0 and 1.0
                      (Default 0.97).

Examples

To create OTU clusters do:

BD.new.
read_fasta(input: "in.fna").
dereplicate_seq.
sort(key: :SEQ_COUNT, reverse: true).
cluster_otus.
run

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Methods included from AuxHelper

#aux_exist

Constructor Details

#initialize(options) ⇒ ClusterOtu

Constructor for ClusterOtu.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :identity (Float)

    Cluster identity.



76
77
78
79
80
81
82
# File 'lib/BioDSL/commands/cluster_otus.rb', line 76

def initialize(options)
  @options = options

  aux_exist('usearch')
  check_options
  defaults
end

Instance Method Details

#lmbObject



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/BioDSL/commands/cluster_otus.rb', line 84

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    TmpDir.create('tmp.fa', 'tmp.uc') do |tmp_in, tmp_out|
      process_input(input, output, tmp_in)

      BioDSL::Usearch.cluster_otus(input: tmp_in, output: tmp_out,
                                   identity: @options[:identity],
                                   verbose: @options[:verbose])

      process_output(output, tmp_out)
    end
  end
end