Method: Bio::PROSITE.pa2re

Defined in:
lib/bio/db/prosite.rb

.pa2re(pattern) ⇒ Object

prosite pattern to regular expression

prosite/prosuser.txt:

The PA (PAttern) lines contains the definition of a PROSITE pattern. The patterns are described using the following conventions:

0) The standard IUPAC one-letter codes for the amino acids are used. 0) Ambiguities are indicated by listing the acceptable amino acids for a

given position, between square parentheses `[ ]'. For example: [ALT]
stands for Ala or Leu or Thr.

1) A period ends the pattern. 2) When a pattern is restricted to either the N- or C-terminal of a

sequence, that pattern either starts with a `<' symbol or respectively
ends with a `>' symbol.

3) Ambiguities are also indicated by listing between a pair of curly

brackets `{ }' the amino acids that are not accepted at a given
position. For example: {AM} stands for any amino acid except Ala and
Met.

4) Repetition of an element of the pattern can be indicated by following

that element with a numerical value or a numerical range between
parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to
x-x or x-x-x or x-x-x-x.

5) The symbol ‘x’ is used for a position where any amino acid is accepted. 6) Each element in a pattern is separated from its neighbor by a ‘-’.

Examples:

PA [AC]-x-V-x(4)-ED.

This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-but Glu or Asp

PA <A-x-(2)-x(0,1)-V.

This pattern, which must be in the N-terminal of the sequence (‘<’), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val



468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
# File 'lib/bio/db/prosite.rb', line 468

def self.pa2re(pattern)
  pattern = pattern.dup
  pattern.gsub!(/\s/, '') # remove white spaces
  pattern.sub!(/\.$/, '') # (1) remove trailing '.'
  pattern.sub!(/^</, '^') # (2) restricted to the N-terminal : `<'
  pattern.sub!(/>$/, '$') # (2) restricted to the C-terminal : `>'
  pattern.gsub!(/\{(\w+)\}/) { |m|
    '[^' + $1 + ']'   # (3) not accepted at a given position : '{}'
  }
  pattern.gsub!(/\(([\d,]+)\)/) { |m|
    '{' + $1 + '}'    # (4) repetition of an element : (n), (n,m)
  }
  pattern.tr!('x', '.') # (5) any amino acid is accepted : 'x'
  pattern.tr!('-', '')  # (6) each element is separated by a '-'
  Regexp.new(pattern, Regexp::IGNORECASE)
end