Module: BioDSL::BackTrack
Overview
Module containing code to locate nucleotide patterns in sequences allowing for ambiguity codes and a given maximum mismatches, insertions, and deletions. The pattern match engine is based on a backtrack algorithm. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern. Algorithm based on code kindly provided by j_random_hacker @ Stackoverflow: stackoverflow.com/questions/7557017/approximate-string-matching-using-backtracking/
Defined Under Namespace
Classes: Match
Constant Summary collapse
- OK_PATTERN =
Regexp.new('^[bflsycwphqrimtnkvadegu]+$')
- MAX_MIS =
Maximum number of mismatches allowed
5
- MAX_INS =
Maximum number of insertions allowed
5
- MAX_DEL =
Maximum number of deletions allowed
5
Instance Method Summary collapse
-
#patmatch(pattern, options = {}) ⇒ Object
————————————————————————– str.patmatch(pattern[, options]) -> Match str.patmatch(pattern[, options]) { |match| block } -> Match.
-
#patscan(pattern, options = {}) ⇒ Object
————————————————————————– str.patscan(pattern[, options]) -> Array str.patscan(pattern[, options]) { |match| block } -> Match.
Methods included from Ambiguity
Instance Method Details
#patmatch(pattern, options = {}) ⇒ Object
str.patmatch(pattern[, options])
-> Match
str.patmatch(pattern[, options]) { |match|
block
}
-> Match
options:
:start
:stop
:max_mismatches
:max_insertions
:max_deletions
Method to iterate through a sequence from a given start position to the end of the sequence or to a given stop position to locate a pattern allowing for a maximum number of mismatches, insertions, and deletions. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/BioDSL/seq/backtrack.rb', line 68 def patmatch(pattern, = {}) [:start] ||= 0 [:stop] ||= length - 1 [:max_mismatches] ||= 0 [:max_insertions] ||= 0 [:max_deletions] ||= 0 patscan(pattern, ) do |m| if block_given? yield m else return m end end end |
#patscan(pattern, options = {}) ⇒ Object
str.patscan(pattern[, options])
-> Array
str.patscan(pattern[, options]) { |match|
block
}
-> Match
options:
:start
:stop
:max_mismatches
:max_insertions
:max_deletions
Method to iterate through a sequence from a given start position to the end of the sequence or to a given stop position to locate a pattern allowing for a maximum number of mismatches, insertions, and deletions. Insertions are nucleotides found in the pattern but not in the sequence. Deletions are nucleotides found in the sequence but not in the pattern. Matches found in block context return the Match object. Otherwise matches are returned in an Array of Match objects.
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/BioDSL/seq/backtrack.rb', line 107 def patscan(pattern, = {}) [:start] ||= 0 [:stop] ||= length - 1 [:max_mismatches] ||= 0 [:max_insertions] ||= 0 [:max_deletions] ||= 0 unless pattern.downcase =~ OK_PATTERN fail BackTrackError, "Bad pattern: #{pattern}" end unless (0...length).include? [:start] fail BackTrackError, "start: #{[:start]} out of range " \ "(0..#{length - 1})" end unless (0...length).include? [:stop] fail BackTrackError, "stop: #{[:stop]} out of range " \ "(0..#{length - 1})" end unless (0..MAX_MIS).include? [:max_mismatches] fail BackTrackError, "max_mismatches: #{[:max_mismatches]} " \ "out of range (0..#{MAX_MIS})" end unless (0..MAX_INS).include? [:max_insertions] fail BackTrackError, "max_insertions: #{[:max_insertions]} " \ "out of range (0..#{MAX_INS})" end unless (0..MAX_DEL).include? [:max_deletions] fail BackTrackError, "max_deletions: #{[:max_deletions]} " \ "out of range (0..#{MAX_DEL})" end matches = [] while (result = scan_C(@seq, pattern, [:start], [:stop], [:max_mismatches], [:max_insertions], [:max_deletions])) match = Match.new(result.first, result.last, @seq[result.first...result.first + result.last]) if block_given? yield match else matches << match end [:start] = result.first + 1 end return matches unless block_given? end |