Module: RGFA::Sequence

Included in:
String
Defined in:
lib/rgfa/sequence.rb

Overview

Extensions of the String class to handle nucleotidic sequences

Constant Summary collapse

WCC =

Watson-Crick Complements

{"a"=>"t","t"=>"a","A"=>"T","T"=>"A",
"c"=>"g","g"=>"c","C"=>"G","G"=>"C",
"b"=>"v","B"=>"V","v"=>"b","V"=>"B",
"h"=>"d","H"=>"D","d"=>"h","D"=>"H",
"R"=>"Y","Y"=>"R","r"=>"y","y"=>"r",
"K"=>"M","M"=>"K","k"=>"m","m"=>"k",
"S"=>"S","s"=>"s","w"=>"w","W"=>"W",
"n"=>"n","N"=>"N","u"=>"a","U"=>"A",
"-"=>"-","."=>".","="=>"=",
" "=>"","\n"=>""}

Instance Method Summary collapse

Instance Method Details

#rc(tolerant: false, rnasequence: false) ⇒ String

Computes the reverse complement of a nucleotidic sequence

Examples:

"ACTG".rc  # => "CAGT"
"acGT".rc  # => "ACgt"

Undefined sequence is represented by “*”:

"*".rc     # => "*"

Extended IUPAC Alphabet:

"ARBN".rc  # => "NVYT"

Usage with RNA sequences:

"ACUG".rc                    # => "CAGU"
"ACG".rc(rnasequence: true)  # => "CGU"
"ACUT".rc                    # (raises RuntimeError, both U and T)

Parameters:

  • tolerant (Boolean) (defaults to: false)

    (defaults to: false) if true, anything non-sequence is complemented to itself

  • rnasequence (Boolean) (defaults to: false)

    (defaults to: false) if true, any A and a is complemented into u and U; otherwise it is so, only if an U is found; otherwise DNA is assumed

Returns:

  • (String)

    reverse complement, without newlines and spaces

  • (String)

    “*” if string is “*”

Raises:

  • (RuntimeError)

    if not tolerant and chars are found for which no Watson-Crick complement is defined

  • (RuntimeError)

    if sequence contains both U and T



32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/rgfa/sequence.rb', line 32

def rc(tolerant: false, rnasequence: false)
  return "*" if self == "*"
  retval = each_char.map do |c|
    if c == "U" or c == "u"
      rnasequence = true
    elsif rnasequence and (c == "T" or c == "t")
      raise "String contains both U/u and T/t"
    end
    wcc = WCC.fetch(c, tolerant ? c : nil)
    raise "#{self}: no Watson-Crick complement for #{c}" if wcc.nil?
    wcc
  end.reverse.join
  if rnasequence
    retval.tr!("tT","uU")
  end
  retval
end