Class: Ms::InSilico::Digester

Inherits:

Object

Object
Ms::InSilico::Digester

show all

Includes:: Constants::Library

Defined in:: lib/ms/in_silico/digester.rb

Overview

Digester splits a protein sequence into peptides at sites specified during initialization; in short Digester models a cleavage enzyme. Digesters support missed cleavage sites, and can return either the peptide strings or the cleavage sites.

Digester includes Constants::Library, allowing access to many common digesters using Digester[]:

trypsin = Digester['Trypsin']
trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG')
# => [
# 'MIVIGR',
# 'SIVHPYITNEYEPFAAEK',
# 'QQILSIMAG']

trypsin.digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# 'MIVIGR',
# 'MIVIGRSIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEK',
# 'SIVHPYITNEYEPFAAEKQQILSIMAG',
# 'QQILSIMAG'
# ]

trypsin.site_digest('MIVIGRSIVHPYITNEYEPFAAEKQQILSIMAG', 1)
# => [
# [0,6],
# [0,24],
# [6,24],
# [6,33],
# [24,33]
# ]

Enzymes

Enzymes in the library were adapted from the default Mascot enzyme list. Currently supported enzymes include:

Arg-C
Asp-N
Asp-N_ambic
Chymotrypsin
CNBr
Lys-C
Lys-C/P
PepsinA
Tryp-CNBr
TrypChymo
Trypsin/P
V8-DE
V8-E
Trypsin
V8-E+Trypsin
V8-DE+Trypsin

Several enzymes require two or more digesters, or functionality that is not provided by Digester, and so remain unsupported:

CNBr+Trypsin
Formic_acid
LysC+AspN
semiTrypsin

Constant Summary collapse

WHITESPACE = a multiline whitespace regexp

/\s*/m

Instance Attribute Summary collapse

#cleave_str ⇒ Object readonly

A string of residues at which cleavage occurs.
#cterm_cleavage ⇒ Object readonly

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.
#cterm_exception ⇒ Object readonly

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).
#name ⇒ Object readonly

The name of the digester.

Instance Method Summary collapse

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ Object

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries.
#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ Object

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites.
#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester constructor

A new instance of Digester.
#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ Object

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages.

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ `Digester`

Returns a new instance of Digester.

# File 'lib/ms/in_silico/digester.rb', line 90

def initialize(name, cleave_str, cterm_exception=nil, cterm_cleavage=true)
  regexp = []
  0.upto(cleave_str.length - 1) {|i| regexp << cleave_str[i, 1] }

  @name = name
  @cleave_str = cleave_str
  @cleave_regexp = Regexp.new(regexp.join('|'))
  @cterm_exception = case 
  when cterm_exception == nil || cterm_exception.empty? then nil
  when cterm_exception.length == 1 then cterm_exception[0]
  else
    raise ArgumentError, "cterm exceptions must be a single residue: #{cterm_exception}"
  end

  @cterm_cleavage = cterm_cleavage
  @scanner = StringScanner.new('')
end

Instance Attribute Details

#cleave_str ⇒ `Object` (readonly)

A string of residues at which cleavage occurs



76
77
78

# File 'lib/ms/in_silico/digester.rb', line 76

def cleave_str
  @cleave_str
end

#cterm_cleavage ⇒ `Object` (readonly)

True if cleavage occurs at the c-terminus of a cleavage residue, false if cleavage occurs at the n-terminus.



85
86
87

# File 'lib/ms/in_silico/digester.rb', line 85

def cterm_cleavage
  @cterm_cleavage
end

#cterm_exception ⇒ `Object` (readonly)

A c-terminal resitriction residue which prevents cleavage at a potential cleavage site (optional).



80
81
82

# File 'lib/ms/in_silico/digester.rb', line 80

def cterm_exception
  @cterm_exception
end

#name ⇒ `Object` (readonly)

The name of the digester



73
74
75

# File 'lib/ms/in_silico/digester.rb', line 73

def name
  @name
end

Instance Method Details

#cleavage_sites(seq, offset = 0, length = seq.length-offset) ⇒ `Object`

Returns digestion sites in sequence, as determined by the cleave_regexp boundaries. The digestion sites correspond to the positions where a peptide begins and ends, such that [n, (n+1) - n] corresponds to the [index, length] for peptide n.

d = Digester.new('Trypsin', 'KR', 'P')
seq = "AARGGR"
sites = d.cleavage_sites(seq)                 # => [0, 3, 6]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR"
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

Trailing whitespace is included in the fragment.

seq = "AAR  \n  GGR"
sites = d.cleavage_sites(seq)                 # => [0, 8, 11]

seq[sites[0], sites[0+1] - sites[0]]          # => "AAR  \n  "
seq[sites[1], sites[1+1] - sites[1]]          # => "GGR"

The digested section of sequence may be specified using offset and length.

# File 'lib/ms/in_silico/digester.rb', line 130

def cleavage_sites(seq, offset=0, length=seq.length-offset)
  return [0, 1] if seq.size == 1  # adding exceptions is lame--algorithm should just work

  adjustment = cterm_cleavage ? 0 : 1
  limit = offset + length

  positions = [offset]
  pos = scan(seq, offset, limit) do |pos|
    positions << (pos - adjustment)
  end

  # add the final position
  if (pos < limit) || (positions.length == 1)
    positions << limit
  end
  # adding exceptions is lame.. this code probably needs to be
  # refactored (corrected).
  if !cterm_cleavage && pos == limit
    positions << limit
  end
  positions
end

#digest(seq, max_misses = 0, offset = 0, length = seq.length-offset) ⇒ `Object`

Returns an array of peptides produced by digesting sequence, allowing for missed cleavage sites. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

# File 'lib/ms/in_silico/digester.rb', line 175

def digest(seq, max_misses=0, offset=0, length=seq.length-offset)
  site_digest(seq, max_misses, offset, length).map do |s, e|
    seq[s, e-s]
  end
end

#site_digest(seq, max_misses = 0, offset = 0, length = seq.length-offset, &block) ⇒ `Object`

Returns digestion sites of sequence as [start_index, end_index] pairs, allowing for missed cleavages. Digestion sites are determined using cleavage_sites; as in that method, the digested section of sequence may be specified using offset and length.

Each [start_index, end_index] pair is yielded to the block, if given, and the collected results are returned.

# File 'lib/ms/in_silico/digester.rb', line 160

def site_digest(seq, max_misses=0, offset=0, length=seq.length-offset, &block) # :yields: start_index, end_index
  frag_sites = cleavage_sites(seq, offset, length)

  overlay(frag_sites.length, max_misses, 1) do |start_index, end_index|
    start_index = frag_sites[start_index]
    end_index = frag_sites[end_index]
    
    block ? block.call(start_index, end_index) : [start_index, end_index]
  end  
end

Class: Ms::InSilico::Digester

Overview

Enzymes

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, cleave_str, cterm_exception = nil, cterm_cleavage = true) ⇒ Digester

Instance Attribute Details

#cleave_str ⇒ Object (readonly)

#cterm_cleavage ⇒ Object (readonly)

#cterm_exception ⇒ Object (readonly)

#name ⇒ Object (readonly)