Class: Ms::InSilico::Digester
- Inherits:
-
Object
- Object
- Ms::InSilico::Digester
- Includes:
- Constants::Library
- Defined in:
- lib/ms/in_silico/digester.rb
Overview
Digester splits a protein sequence into peptides at sites specified during initialization; in short Digester models a cleavage enzyme. Digesters support missed cleavage sites, and can return either the peptide strings or the cleavage sites.
Digester includes Constants::Library, allowing access to many common digesters using Digester[]:
trypsin = Digester['Trypsin']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => [
# 'MIVIGR',
# 'SIVHPYITNEYEPFAAEK',
# 'QQILSIMAG']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# 'MIVIGR',
# 'MIVIGRSIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG',
# 'QQILSIMAG'
# ]
trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# [0,6],
# [0,24],
# [6,24],
# [6,33],
# [24,33]
# ]
Enzymes
Enzymes in the library were adapted from the default Mascot enzyme list. Currently supported enzymes include:
-
Arg-C
-
Asp-N
-
Asp-N_ambic
-
Chymotrypsin
-
CNBr
-
Lys-C
-
Lys-C/P
-
PepsinA
-
Tryp-CNBr
-
TrypChymo
-
Trypsin/P
-
V8-DE
-
V8-E
-
Trypsin
-
V8-E+Trypsin
-
V8-DE+Trypsin
Several enzymes require two or more digesters, or functionality that is not provided by Digester, and so remain unsupported:
-
CNBr+Trypsin
-
Formic_acid
-
LysC+AspN
-
semiTrypsin
Constant Summary collapse
- WHITESPACE =
a multiline whitespace regexp
/\s*/m
Instance Attribute Summary collapse
-
#cleave_str ⇒ Object
readonly
A string of residues at which cleavage occurs.
-
#cterm_cleavage ⇒ Object
readonly
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
-
#cterm_exception ⇒ Object
readonly
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
-
#name ⇒ Object
readonly
The name of the digester.
Instance Method Summary collapse
-
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites in sequence, as determined by the cleave_regexp boundaries.
-
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites.
-
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
constructor
A new instance of Digester.
-
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages.
Constructor Details
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
Returns a new instance of Digester.
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/ms/in_silico/digester.rb', line 90 def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true) regexp = [] 0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] } @name = name @cleave_str = cleave_str @cleave_regexp = Regexp.new(regexp.join('|')) @cterm_exception = case when cterm_exception == nil || cterm_exception.empty? then nil when cterm_exception.length == 1 then cterm_exception[0] else raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}" end @cterm_cleavage = cterm_cleavage @scanner = StringScanner.new('') end |
Instance Attribute Details
#cleave_str ⇒ Object (readonly)
A string of residues at which cleavage occurs
76 77 78 |
# File 'lib/ms/in_silico/digester.rb', line 76 def cleave_str @cleave_str end |
#cterm_cleavage ⇒ Object (readonly)
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
85 86 87 |
# File 'lib/ms/in_silico/digester.rb', line 85 def cterm_cleavage @cterm_cleavage end |
#cterm_exception ⇒ Object (readonly)
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
80 81 82 |
# File 'lib/ms/in_silico/digester.rb', line 80 def cterm_exception @cterm_exception end |
#name ⇒ Object (readonly)
The name of the digester
73 74 75 |
# File 'lib/ms/in_silico/digester.rb', line 73 def name @name end |
Instance Method Details
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.
d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq) # => [0, 3, 6]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR"
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
Trailing whitespace is included in the fragment.
seq = "AAR \n GGR"
sites = d.cleavage_sites(seq) # => [0, 8, 11]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR \n "
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
The digested section of sequence may be specified using offset and length.
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
# File 'lib/ms/in_silico/digester.rb', line 130 def cleavage_sites(seq, offset=0, length=seq.length-offset) return [0, 1] if seq.size == 1 # adding exceptions is lame--algorithm should just work adjustment = cterm_cleavage ? 0 : 1 limit = offset + length positions = [offset] pos = scan(seq, offset, limit) do |pos| positions << (pos - adjustment) end # add the final position if (pos < limit) || (positions.length == 1) positions << limit end # adding exceptions is lame.. this code probably needs to be # refactored (corrected). if !cterm_cleavage && pos == limit positions << limit end positions end |
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
175 176 177 178 179 |
# File 'lib/ms/in_silico/digester.rb', line 175 def digest(seq, max_misses=0, offset=0, length=seq.length-offset) site_digest(seq, max_misses, offset, length).map do |s, e| seq[s, e-s] end end |
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.
160 161 162 163 164 165 166 167 168 169 |
# File 'lib/ms/in_silico/digester.rb', line 160 def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index frag_sites = cleavage_sites(seq, offset, length) (frag_sites.length, max_misses, 1) do |start_index, end_index| start_index = frag_sites[start_index] end_index = frag_sites[end_index] block ? block.call(start_index, end_index) : [start_index, end_index] end end |