Module: BioDSL::BackTrack

Extended by:
Ambiguity
Included in:
Seq
Defined in:
lib/BioDSL/seq/backtrack.rb

Overview

Module containing code to locate nucleotide patterns in sequences allowing for ambiguity codes and a given maximum mismatches, insertions, and deletions. The pattern match engine is based on a backtrack algorithm. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern. Algorithm based on code kindly provided by j_random_hacker @ Stackoverflow: stackoverflow.com/questions/7557017/approximate-string-matching-using-backtracking/

Defined Under Namespace

Classes: Match

Constant Summary collapse

OK_PATTERN =
Regexp.new('^[bflsycwphqrimtnkvadegu]+$')
MAX_MIS =

Maximum number of mismatches allowed

5
MAX_INS =

Maximum number of insertions allowed

5
MAX_DEL =

Maximum number of deletions allowed

5

Instance Method Summary collapse

Methods included from Ambiguity

add_ambiguity_macro

Instance Method Details

#patmatch(pattern, options = {}) ⇒ Object


str.patmatch(pattern[, options])
-> Match
str.patmatch(pattern[, options]) { |match|
  block
}
-> Match

options:
  :start
  :stop
  :max_mismatches
  :max_insertions
  :max_deletions

Method to iterate through a sequence from a given start position to the end of the sequence or to a given stop position to locate a pattern allowing for a maximum number of mismatches, insertions, and deletions. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern.



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/BioDSL/seq/backtrack.rb', line 68

def patmatch(pattern, options = {})
  options[:start] ||= 0
  options[:stop] ||= length - 1
  options[:max_mismatches] ||= 0
  options[:max_insertions] ||= 0
  options[:max_deletions] ||= 0

  patscan(pattern, options) do |m|
    if block_given?
      yield m
    else
      return m
    end
  end
end

#patscan(pattern, options = {}) ⇒ Object


str.patscan(pattern[, options])
-> Array
str.patscan(pattern[, options]) { |match|
  block
}
-> Match

options:
  :start
  :stop
  :max_mismatches
  :max_insertions
  :max_deletions

Method to iterate through a sequence from a given start position to the end of the sequence or to a given stop position to locate a pattern allowing for a maximum number of mismatches, insertions, and deletions. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern. Matches found in block context return the Match object. Otherwise matches are returned in an Array of Match objects.



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/BioDSL/seq/backtrack.rb', line 107

def patscan(pattern, options = {})
  options[:start] ||= 0
  options[:stop] ||= length - 1
  options[:max_mismatches] ||= 0
  options[:max_insertions] ||= 0
  options[:max_deletions] ||= 0

  unless pattern.downcase =~ OK_PATTERN
    fail BackTrackError, "Bad pattern: #{pattern}"
  end

  unless (0...length).include? options[:start]
    fail BackTrackError, "start: #{options[:start]} out of range " \
                         "(0..#{length - 1})"
  end

  unless (0...length).include? options[:stop]
    fail BackTrackError, "stop: #{options[:stop]} out of range " \
                         "(0..#{length - 1})"
  end

  unless (0..MAX_MIS).include? options[:max_mismatches]
    fail BackTrackError, "max_mismatches: #{options[:max_mismatches]} " \
                         "out of range (0..#{MAX_MIS})"
  end

  unless (0..MAX_INS).include? options[:max_insertions]
    fail BackTrackError, "max_insertions: #{options[:max_insertions]} " \
                         "out of range (0..#{MAX_INS})"
  end

  unless (0..MAX_DEL).include? options[:max_deletions]
    fail BackTrackError, "max_deletions: #{options[:max_deletions]} " \
                         "out of range (0..#{MAX_DEL})"
  end

  matches = []

  while (result = scan_C(@seq, pattern, options[:start], options[:stop],
                         options[:max_mismatches], options[:max_insertions],
                         options[:max_deletions]))
    match = Match.new(result.first, result.last,
                      @seq[result.first...result.first + result.last])

    if block_given?
      yield match
    else
      matches << match
    end

    options[:start] = result.first + 1
  end

  return matches unless block_given?
end