Class: Ms::InSilico::Digester

Inherits:
Object
  • Object
show all
Includes:
Constants::Library
Defined in:
lib/ms/in_silico/digester.rb

Overview

Digester splits a protein sequence into peptides at sites specified during initialization; in short Digester models a cleavage enzyme. Digesters support missed cleavage sites, and can return either the peptide strings or the cleavage sites.

Digester includes Constants::Library, allowing access to many common digesters using Digester[]:

trypsin = Digester['Trypsin']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => [
# 'MIVIGR',
# 'SIVHPYITNEYEPFAAEK',
# 'QQILSIMAG']

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# 'MIVIGR',
# 'MIVIGRSIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG',
# 'QQILSIMAG'
# ]

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# [0,6],
# [0,24],
# [6,24],
# [6,33],
# [24,33]
# ]

Enzymes

Enzymes in the library were adapted from the default Mascot enzyme list. Currently supported enzymes include:

  • Arg-C

  • Asp-N

  • Asp-N_ambic

  • Chymotrypsin

  • CNBr

  • Lys-C

  • Lys-C/P

  • PepsinA

  • Tryp-CNBr

  • TrypChymo

  • Trypsin/P

  • V8-DE

  • V8-E

  • Trypsin

  • V8-E+Trypsin

  • V8-DE+Trypsin

Several enzymes require two or more digesters, or functionality that is not provided by Digester, and so remain unsupported:

  • CNBr+Trypsin

  • Formic_acid

  • LysC+AspN

  • semiTrypsin

Constant Summary collapse

WHITESPACE =

a multiline whitespace regexp

/\s*/m

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester

Returns a new instance of Digester.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/ms/in_silico/digester.rb', line 90

def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
  when cterm_exception == nil || cterm_exception.empty? then nil
  when cterm_exception.length == 1 then cterm_exception[0]
  else
    raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
  end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Instance Attribute Details

#cleave_strObject (readonly)

A string of residues at which cleavage occurs



76
77
78
# File 'lib/ms/in_silico/digester.rb', line 76

def cleave_str
  @cleave_str
end

#cterm_cleavageObject (readonly)

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.



85
86
87
# File 'lib/ms/in_silico/digester.rb', line 85

def cterm_cleavage
  @cterm_cleavage
end

#cterm_exceptionObject (readonly)

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).



80
81
82
# File 'lib/ms/in_silico/digester.rb', line 80

def cterm_exception
  @cterm_exception
end

#nameObject (readonly)

The name of the digester



73
74
75
# File 'lib/ms/in_silico/digester.rb', line 73

def name
  @name
end

Instance Method Details

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.



130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/ms/in_silico/digester.rb', line 130

def cleavage_sites(seq, offset=0, length=seq.length-offset)
  return [0, 1] if seq.size == 1  # adding exceptions is lame--algorithm should just work

  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << (pos - adjustment)
  end

  # add the final position
  if (pos < limit) || (positions.length == 1)
    positions << limit
  end
  # adding exceptions is lame.. this code probably needs to be
  # refactored (corrected).
  if !cterm_cleavage && pos == limit
    positions << limit
  end
  positions
end

#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.



175
176
177
178
179
# File 'lib/ms/in_silico/digester.rb', line 175

def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).map do |s, e|
    seq[s, e-s]
  end
end

#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.



160
161
162
163
164
165
166
167
168
169
# File 'lib/ms/in_silico/digester.rb', line 160

def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]
    
    block ? block.call(start_index, end_index) : [start_index, end_index]
  end  
end