Class: BioDSL::DegapSeq

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/degap_seq.rb

Overview

Remove gaps from sequences or gap only columns in alignments.

degap_seq remove gaps from sequences (the letters ~-_.). If the option columns_only is used then gaps from aligned sequences will be removed, if and only if the the entire columns consists of gaps.

Usage

degap_seq([columns_only: <bool>])

Options

  • columns_only: <bool> - Remove gap columns only (default=false).

Examples

Consider the following FASTA entries in the file ‘test.fna`:

>test1
A-G~T.C_
>test2
AGG_T-C~

To remove all gaps from all sequences do:

BD.new.read_fasta(input: "test.fna").degap_seq.dump.run

{:SEQ_NAME=>"test1", :SEQ=>"AGTC", :SEQ_LEN=>4}
{:SEQ_NAME=>"test2", :SEQ=>"AGGTC", :SEQ_LEN=>5}

To remove all gap-only columns use the columns_only option:

BD.new.
read_fasta(input: "test.fna").
degap_seq(columns_only: true).
dump.
run

{:SEQ_NAME=>"test1", :SEQ=>"A-GTC", :SEQ_LEN=>5}
{:SEQ_NAME=>"test2", :SEQ=>"AGGTC", :SEQ_LEN=>5}

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out residues_in
residues_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ DegapSeq

Constructor for DegapSeq.

Parameters:

  • options (Hash)

    Options Hash.

Options Hash (options):

  • :columns_only (Boolean)

    Flag indicating that only gap-columns only shoule be removed.



85
86
87
88
89
90
91
92
93
# File 'lib/BioDSL/commands/degap_seq.rb', line 85

def initialize(options)
  @options = options
  @indels  = BioDSL::Seq::INDELS.sort.join('')
  @na_mask = nil
  @max_len = nil
  @count   = 0

  check_options
end

Instance Method Details

#lmbProc

Return the command lambda for DegapSeq.

Returns:

  • (Proc)

    Command lambda.



98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/BioDSL/commands/degap_seq.rb', line 98

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    if @options[:columns_only]
      degap_columns(input, output)
      status[:columns_removed] = @na_mask.count_false
    else
      degap_all(input, output)
    end
  end
end