Class: Mspire::Digester

Inherits:
Object
  • Object
show all
Defined in:
lib/mspire/digester.rb

Overview

A Digester splits a protein sequence into peptides at specified sites.

trypsin = Mspire::Digester[:trypsin]

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => ['MIVIGR', 'SIVHPYITNEYEPFAAEK', 'QQILSIMAG']

With 1 missed cleavage:

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => ['MIVIGR','MIVIGRSIVHPYITNEYEPFAAEK','SIVHPYITNEYEPFAAEK', 
#     'SIVHPYITNEYEPFAAEKQQILSIMAG', 'QQILSIMAG']

Return the start and end sites of digestion:

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [[0,6],[0,24],[6,24],[6,33],[24,33]]

Constant Summary collapse

MULTILINE_WHITESPACE =
/\s*/m

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester

Returns a new instance of Digester.



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/mspire/digester.rb', line 41

def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
                     when cterm_exception == nil || cterm_exception.empty? then nil
                     when cterm_exception.length == 1 then cterm_exception[0]
                     else
                       raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
                     end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Instance Attribute Details

#cleave_strObject (readonly)

A string of residues at which cleavage occurs



28
29
30
# File 'lib/mspire/digester.rb', line 28

def cleave_str
  @cleave_str
end

#cterm_cleavageObject (readonly)

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.



37
38
39
# File 'lib/mspire/digester.rb', line 37

def cterm_cleavage
  @cterm_cleavage
end

#cterm_exceptionObject (readonly)

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).



32
33
34
# File 'lib/mspire/digester.rb', line 32

def cterm_exception
  @cterm_exception
end

#nameObject (readonly)

The name of the digester



25
26
27
# File 'lib/mspire/digester.rb', line 25

def name
  @name
end

Class Method Details

.[](enzyme_name) ⇒ Object

takes the name of the enzyme in any case (symbol or string) and accesses the constant (returns nil if none found)



185
186
187
# File 'lib/mspire/digester.rb', line 185

def [](enzyme_name)
  ENZYMES[ enzyme_name.to_s.downcase.gsub(/\W+/,'_').to_sym ]
end

.mascot_parse(str) ⇒ Object

Utility method to parse a mascot enzyme configuration string (tab separated) into a Digester.



191
192
193
194
195
196
197
198
199
200
# File 'lib/mspire/digester.rb', line 191

def mascot_parse(str) # :nodoc:
  name, sense, cleave_str, cterm_exception, independent, semi_specific = str.split(/ *\t */)
  cterm_cleavage = case sense
                   when 'C-Term' then true
                   when 'N-Term' then false
                   else raise ArgumentError, "unknown sense: #{sense}"
                   end

  new(name, cleave_str, cterm_exception, cterm_cleavage)
end

Instance Method Details

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/mspire/digester.rb', line 81

def cleavage_sites(seq, offset=0, length=seq.length-offset)
  return [0, 1] if seq.size == 1  # adding exceptions is lame--algorithm should just work

  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << (pos - adjustment)
  end

  # add the final position
  if (pos < limit) || (positions.length == 1)
    positions << limit
  end
  # adding exceptions is lame.. this code probably needs to be
  # refactored (corrected).
  if !cterm_cleavage && pos == limit
    positions << limit
  end
  positions
end

#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.



126
127
128
129
130
# File 'lib/mspire/digester.rb', line 126

def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).map do |s, e|
    seq[s, e-s]
  end
end

#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.



111
112
113
114
115
116
117
118
119
120
# File 'lib/mspire/digester.rb', line 111

def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]

    block ? block.call(start_index, end_index) : [start_index, end_index]
  end  
end