Class: BioDSL::DereplicateSeq
- Inherits:
-
Object
- Object
- BioDSL::DereplicateSeq
- Defined in:
- lib/BioDSL/commands/dereplicate_seq.rb
Overview
Dereplicate sequences in the stream.
dereplicate_seq
removes all duplicate sequence records. Dereplicated sequences are output along with the count of replicates. Using the ignore_case
option disables the default case sensitive sequence matching.
Usage
dereplicate_seq([ignore_case: <bool>])
Options
-
ignore_case: <bool> - Ignore sequence case.
Examples
Consider the following FASTA file test.fna:
>test1
ATGC
>test2
ATGC
>test3
GCAT
To dereplicate all sequences we use read_fasta
and dereplicate_seq
:
BD.new.read_fasta(input: "test.fna").dereplicate_seq.dump.run
{:SEQ_NAME=>"test1", :SEQ=>"ATGC", :SEQ_LEN=>4, :SEQ_COUNT=>2}
{:SEQ_NAME=>"test3", :SEQ=>"GCAT", :SEQ_LEN=>4, :SEQ_COUNT=>1}
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out)
Instance Method Summary collapse
-
#initialize(options) ⇒ DereplicateSeq
constructor
Constructor for the DereplicateSeq class.
-
#lmb ⇒ Proc
Return the command lambda for DereplicateSeq.
Constructor Details
#initialize(options) ⇒ DereplicateSeq
Constructor for the DereplicateSeq class.
70 71 72 73 74 75 |
# File 'lib/BioDSL/commands/dereplicate_seq.rb', line 70 def initialize() @options = @lookup = {} end |
Instance Method Details
#lmb ⇒ Proc
Return the command lambda for DereplicateSeq.
80 81 82 83 84 85 86 87 88 89 |
# File 'lib/BioDSL/commands/dereplicate_seq.rb', line 80 def lmb lambda do |input, output, status| status_init(status, STATS) TmpDir.create('dereplicate_seq') do |tmp_file, _| process_input(input, output, tmp_file) process_output(output, tmp_file) end end end |