Class: BioDSL::ClipPrimer

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/clip_primer.rb

Overview

Clip sequences in the stream at a specified primer location.

clip_primer locates a specified primer in sequences in the stream and clips the sequence after the match if the direction is forward or before the match is the direction is reverse. Using the reverse_complement option the primer sequence will be reverse complemented prior to matching. Using the search_distance option will limit the primer search to the beginning of the sequence if the direction is forward and to the end if the direction is reverse.

Non-perfect matching can be allowed by setting the allowed mismatch_percent, insertion_percent and deletion_percent.

The following keys are added to clipped records:

  • CLIP_PRIMER_DIR - Direction of clip.

  • CLIP_PRIMER_POS - Sequence position of clip (0 based).

  • CLIP_PRIMER_LEN - Length of clip match.

  • CLIP_PRIMER_PAT - Clip match pattern.

Usage

clip_primer(<primer: <string>>, <direction: <:forward|:reverse>
            [, reverse_complement: <bool>[, search_distance: <uint>
            [, mismatch_percent: <uint>
            [, insertion_percent: <uint>
            [, deletion_percent: <uint>]]]]])

Options

  • primer: <string> - Primer sequence to search for.

  • direction: <:forward|:reverse> - Clip direction.

  • reverse_complement: <bool> - Reverse complement primer (default=false).

  • search_distance: <uint> - Search distance from forward or reverse end.

  • mismatch_percent: <unit> - Allowed percent mismatches (default=0).

  • insertion_percent: <unit> - Allowed percent insertions (default=0).

  • deletion_percent: <unit> - Allowed percent mismatches (default=0).

Examples

Consider the following FASTA entry in the file test.fq:

>test
actgactgaTCGTATGCCGTCTTCTGCTTactacgt

To clip this sequence in the forward direction with the primer ‘TGACTACGACTACGACTACT’ do:

BD.new.
read_fasta(input: "test.fna").
clip_primer(primer: "TGACTACGACTACGACTACT", direction: :forward).
dump.
run

{:SEQ_NAME=>"test",
 :SEQ=>"actacgt",
 :SEQ_LEN=>7,
 :CLIP_PRIMER_DIR=>"FORWARD",
 :CLIP_PRIMER_POS=>9,
 :CLIP_PRIMER_LEN=>20,
 :CLIP_PRIMER_PAT=>"TGACTACGACTACGACTACT"}

Or in the reverse direction:

BD.new.
read_fasta(input: "test.fna").
clip_primer(primer: "TGACTACGACTACGACTACT", direction: :reverse).
dump.
run

{:SEQ_NAME=>"test",
 :SEQ=>"actgactga",
 :SEQ_LEN=>9,
 :CLIP_PRIMER_DIR=>"REVERSE",
 :CLIP_PRIMER_POS=>9,
 :CLIP_PRIMER_LEN=>20,
 :CLIP_PRIMER_PAT=>"TGACTACGACTACGACTACT"}

rubocop:disable ClassLength

Constant Summary collapse

STATS =
%i(records_in records_out sequences_in sequences_out
residues_in residues_out pattern_hits pattern_misses)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ ClipPrimer

Constructor for ClipPrimer.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :primer (String)

    Primer used for matching.

  • :direction (Symbol)

    Direction for clipping.

  • :search_distance (Integer)

    Search distance.

  • :reverse_complment (Boolean)

    Flag indicating that primer should be reverse complemented.



122
123
124
125
126
127
128
129
130
131
# File 'lib/BioDSL/commands/clip_primer.rb', line 122

def initialize(options)
  @options = options
  defaults
  check_options

  @primer  = primer
  @mis     = calc_mis
  @ins     = calc_ins
  @del     = calc_del
end

Instance Method Details

#lmbProc

Lambda for ClipPrimer command.

Returns:

  • (Proc)

    Lambda for command.



136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/BioDSL/commands/clip_primer.rb', line 136

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      clip_primer(record) if record[:SEQ] && record[:SEQ].length > 0

      output << record
      @status[:records_out] += 1
    end
  end
end