Class: Ms::InSilico::Digester
- Inherits:
-
Object
- Object
- Ms::InSilico::Digester
- Includes:
- Constants::Library
- Defined in:
- lib/ms/in_silico/digester.rb
Overview
Digester splits a protein sequence into peptides at sites specified during initialization; in short Digester models a cleavage enzyme. Digesters support missed cleavage sites, and can return either the peptide strings or the cleavage sites.
Digester includes Constants::Library, allowing access to many common digesters using Digester[]:
trypsin = Digester['Trypsin']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => [
# 'MIVIGR',
# 'SIVHPYITNEYEPFAAEK',
# 'QQILSIMAG']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# 'MIVIGR',
# 'MIVIGRSIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG',
# 'QQILSIMAG'
# ]
trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# [0,6],
# [0,24],
# [6,24],
# [6,33],
# [24,33]
# ]
Enzymes
Enzymes in the library were adapted from the default Mascot enzyme list. Currently supported enzymes include:
-
Arg-C
-
Asp-N
-
Asp-N_ambic
-
Chymotrypsin
-
CNBr
-
Lys-C
-
Lys-C/P
-
PepsinA
-
Tryp-CNBr
-
TrypChymo
-
Trypsin/P
-
V8-DE
-
V8-E
-
Trypsin
-
V8-E+Trypsin
-
V8-DE+Trypsin
Several enzymes require two or more digesters, or functionality that is not provided by Digester, and so remain unsupported:
-
CNBr+Trypsin
-
Formic_acid
-
LysC+AspN
-
semiTrypsin
Constant Summary collapse
- WHITESPACE =
a multiline whitespace regexp
/\s*/m
Instance Attribute Summary collapse
-
#cleave_str ⇒ Object
readonly
A string of residues at which cleavage occurs.
-
#cterm_cleavage ⇒ Object
readonly
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
-
#cterm_exception ⇒ Object
readonly
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
-
#name ⇒ Object
readonly
The name of the digester.
Instance Method Summary collapse
-
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns sites of digestion sites in sequence, as determined by thecleave_regexp boundaries.
-
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites.
-
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
constructor
A new instance of Digester.
-
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages.
Constructor Details
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester
Returns a new instance of Digester.
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/ms/in_silico/digester.rb', line 90 def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true) regexp = [] 0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] } @name = name @cleave_str = cleave_str @cleave_regexp = Regexp.new(regexp.join('|')) @cterm_exception = case when cterm_exception == nil || cterm_exception.empty? then nil when cterm_exception.length == 1 then cterm_exception[0] else raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}" end @cterm_cleavage = cterm_cleavage @scanner = StringScanner.new('') end |
Instance Attribute Details
#cleave_str ⇒ Object (readonly)
A string of residues at which cleavage occurs
76 77 78 |
# File 'lib/ms/in_silico/digester.rb', line 76 def cleave_str @cleave_str end |
#cterm_cleavage ⇒ Object (readonly)
True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
85 86 87 |
# File 'lib/ms/in_silico/digester.rb', line 85 def cterm_cleavage @cterm_cleavage end |
#cterm_exception ⇒ Object (readonly)
A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
80 81 82 |
# File 'lib/ms/in_silico/digester.rb', line 80 def cterm_exception @cterm_exception end |
#name ⇒ Object (readonly)
The name of the digester
73 74 75 |
# File 'lib/ms/in_silico/digester.rb', line 73 def name @name end |
Instance Method Details
#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object
Returns sites of digestion sites in sequence, as determined by thecleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that
- n, (n+1) - n
-
corresponds to the [index, length] for peptide n.
d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq) # => [0, 3, 6]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR"
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
Trailing whitespace is included in the fragment.
seq = "AAR \n GGR"
sites = d.cleavage_sites(seq) # => [0, 8, 11]
seq[sites[0], sites[0+1] - sites[0]] # => "AAR \n "
seq[sites[1], sites[1+1] - sites[1]] # => "GGR"
The digested section of sequence may be specified using offset and length.
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/ms/in_silico/digester.rb', line 130 def cleavage_sites(seq, offset=0, length=seq.length-offset) adjustment = cterm_cleavage ? 0 : 1 limit = offset + length positions = [offset] pos = scan(seq, offset, limit) do |pos| positions << pos - adjustment end # add the final position if pos < limit || positions.length == 1 positions << limit end positions end |
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
169 170 171 172 173 |
# File 'lib/ms/in_silico/digester.rb', line 169 def digest(seq, max_misses=0, offset=0, length=seq.length-offset) site_digest(seq, max_misses, offset, length).collect do |s, e| seq[s, e-s] end end |
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object
Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.
Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.
154 155 156 157 158 159 160 161 162 163 |
# File 'lib/ms/in_silico/digester.rb', line 154 def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset) # :yields: start_index, end_index frag_sites = cleavage_sites(seq, offset, length) (frag_sites.length, max_misses, 1) do |start_index, end_index| start_index = frag_sites[start_index] end_index = frag_sites[end_index] block_given? ? yield(start_index, end_index) : [start_index, end_index] end end |