Method: Bio::PROSITE.pa2re
- Defined in:
- lib/bio/db/prosite.rb
.pa2re(pattern) ⇒ Object
prosite pattern to regular expression
prosite/prosuser.txt:
The PA (PAttern) lines contains the definition of a PROSITE pattern. The patterns are described using the following conventions:
0) The standard IUPAC one-letter codes for the amino acids are used. 0) Ambiguities are indicated by listing the acceptable amino acids for a
given position, between square parentheses `[ ]'. For example: [ALT]
stands for Ala or Leu or Thr.
1) A period ends the pattern. 2) When a pattern is restricted to either the N- or C-terminal of a
sequence, that pattern either starts with a `<' symbol or respectively
ends with a `>' symbol.
3) Ambiguities are also indicated by listing between a pair of curly
brackets `{ }' the amino acids that are not accepted at a given
position. For example: {AM} stands for any amino acid except Ala and
Met.
4) Repetition of an element of the pattern can be indicated by following
that element with a numerical value or a numerical range between
parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to
x-x or x-x-x or x-x-x-x.
5) The symbol ‘x’ is used for a position where any amino acid is accepted. 6) Each element in a pattern is separated from its neighbor by a ‘-’.
Examples:
PA [AC]-x-V-x(4)-ED.
This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-but Glu or Asp
PA <A-x-(2)-x(0,1)-V.
This pattern, which must be in the N-terminal of the sequence (‘<’), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 |
# File 'lib/bio/db/prosite.rb', line 468 def self.pa2re(pattern) pattern = pattern.dup pattern.gsub!(/\s/, '') # remove white spaces pattern.sub!(/\.$/, '') # (1) remove trailing '.' pattern.sub!(/^</, '^') # (2) restricted to the N-terminal : `<' pattern.sub!(/>$/, '$') # (2) restricted to the C-terminal : `>' pattern.gsub!(/\{(\w+)\}/) { |m| '[^' + $1 + ']' # (3) not accepted at a given position : '{}' } pattern.gsub!(/\(([\d,]+)\)/) { |m| '{' + $1 + '}' # (4) repetition of an element : (n), (n,m) } pattern.tr!('x', '.') # (5) any amino acid is accepted : 'x' pattern.tr!('-', '') # (6) each element is separated by a '-' Regexp.new(pattern, Regexp::IGNORECASE) end |