Class: Ms::InSilico::Digester

Inherits:
Object
  • Object
show all
Includes:
Constants::Library
Defined in:
lib/ms/in_silico/digester.rb

Overview

Digester splits a protein sequence into peptides at sites specified during initialization; in short Digester models a cleavage enzyme. Digesters support missed cleavage sites, and can return either the peptide strings or the cleavage sites.

Digester includes Constants::Library, allowing access to many common digesters using Digester[]:

trypsin = Digester['Trypsin']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => [
# 'MIVIGR',
# 'SIVHPYITNEYEPFAAEK',
# 'QQILSIMAG']

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# 'MIVIGR',
# 'MIVIGRSIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG',
# 'QQILSIMAG'
# ]

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# [0,6],
# [0,24],
# [6,24],
# [6,33],
# [24,33]
# ]

Enzymes

Enzymes in the library were adapted from the default Mascot enzyme list. Currently supported enzymes include:

  • Arg-C

  • Asp-N

  • Asp-N_ambic

  • Chymotrypsin

  • CNBr

  • Lys-C

  • Lys-C/P

  • PepsinA

  • Tryp-CNBr

  • TrypChymo

  • Trypsin/P

  • V8-DE

  • V8-E

  • Trypsin

  • V8-E+Trypsin

  • V8-DE+Trypsin

Several enzymes require two or more digesters, or functionality that is not provided by Digester, and so remain unsupported:

  • CNBr+Trypsin

  • Formic_acid

  • LysC+AspN

  • semiTrypsin

Constant Summary collapse

WHITESPACE =

a multiline whitespace regexp

/\s*/m

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester

Returns a new instance of Digester.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/ms/in_silico/digester.rb', line 90

def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
  when cterm_exception == nil || cterm_exception.empty? then nil
  when cterm_exception.length == 1 then cterm_exception[0]
  else
    raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
  end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Instance Attribute Details

#cleave_strObject (readonly)

A string of residues at which cleavage occurs



76
77
78
# File 'lib/ms/in_silico/digester.rb', line 76

def cleave_str
  @cleave_str
end

#cterm_cleavageObject (readonly)

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.



85
86
87
# File 'lib/ms/in_silico/digester.rb', line 85

def cterm_cleavage
  @cterm_cleavage
end

#cterm_exceptionObject (readonly)

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).



80
81
82
# File 'lib/ms/in_silico/digester.rb', line 80

def cterm_exception
  @cterm_exception
end

#nameObject (readonly)

The name of the digester



73
74
75
# File 'lib/ms/in_silico/digester.rb', line 73

def name
  @name
end

Instance Method Details

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object

Returns sites of digestion sites in sequence, as determined by thecleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that

n, (n+1) - n

corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.



130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/ms/in_silico/digester.rb', line 130

def cleavage_sites(seq, offset=0, length=seq.length-offset)
  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << pos - adjustment
  end

  # add the final position
  if pos < limit || positions.length == 1
    positions << limit
  end

  positions
end

#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.



169
170
171
172
173
# File 'lib/ms/in_silico/digester.rb', line 169

def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).collect do |s, e|
    seq[s, e-s]
  end
end

#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.



154
155
156
157
158
159
160
161
162
163
# File 'lib/ms/in_silico/digester.rb', line 154

def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]
    
    block_given? ? yield(start_index, end_index) : [start_index, end_index]
  end  
end