Class: String

Inherits:
Object
  • Object
show all
Defined in:
lib/viral_seq/string.rb

Overview

functions added to Class::String for direct operation on sequence as a String object

Instance Method Summary collapse

Instance Method Details

#compare_with(seq2) ⇒ Integer

compare two sequences as String objects, two sequence strings need to aligned first

Examples:

compare two sequence strings, without alignment and with alignment

seq1 = 'AAGGCGTAGGAC'
seq2 = 'AAGCTTAGGACG'
seq1.compare_with(seq2) # no alignment
=> 8
aligned_seqs = ViralSeq::Muscle.align(seq1,seq2) # align using MUSCLE
aligned_seqs[0].compare_with(aligned_seqs[1])
=> 4

Parameters:

  • seq2 (String)

    the sequence string to compare with

Returns:

  • (Integer)

    the total number of differences as integer



108
109
110
111
112
113
114
115
116
117
118
# File 'lib/viral_seq/string.rb', line 108

def compare_with(seq2)
  seq1 = self
  length = seq1.size
  diff = 0
  (0..(length-1)).each do |position|
    nt1 = seq1[position]
    nt2 = seq2[position]
    diff += 1 unless nt1 == nt2
  end
  return diff
end

#mutation(error_rate = 0.01) ⇒ String

mutate a nt sequence (String class) randomly

Examples:

mutate a sequence at an error rate of 0.05

seq = "TGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTG"
seq.mutation(0.05)
=> "TGGAAGGGCTAATGCACTCCCAACGAAGACACGATATCCTTGATCTGTGGATCTACGACACACAAGGCTGCTTCCCTG"

Parameters:

  • error_rate (Float) (defaults to: 0.01)

    define an error rate for mutation, default to ‘0.01`

Returns:

  • (String)

    mutated sequence as String



23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/viral_seq/string.rb', line 23

def mutation(error_rate = 0.01)
  new_string = ""
  self.split("").each do |nt|
    pool = ["A","C","T","G"]
    pool.delete(nt)
    s = error_rate * 10000
    r = rand(10000)
    if r < s
      nt = pool.sample
    end
    new_string << nt
  end
  return new_string
end

#nt_parserRegexp

parse the nucleotide sequences as a String object

and return a Regexp object for possible matches

Examples:

parse a sequence with ambiguities

"ATRWCG".nt_parser
=> /AT[A|G][A|T]CG/

Returns:

  • (Regexp)

    as possible matches



45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/viral_seq/string.rb', line 45

def nt_parser
  match = ""
  self.each_char.each do |base|
    base_array = base.to_list
    if base_array.size == 1
      match += base_array[0]
    else
      pattern = "[" + base_array.join("|") + "]"
      match += pattern
    end
  end
  Regexp.new match
end

#rcString

reverse complement

Examples:

Reverse complement

"ACAGA".rc
=> "TCTGT"

Returns:

  • (String)

    reverse complement sequence



11
12
13
# File 'lib/viral_seq/string.rb', line 11

def rc
    self.reverse.tr("ACTG","TGAC")
end

#to_listArray

parse IUPAC nucleotide ambiguity codes (W S M K R Y B D H V N) as String if String.size == 1

Examples:

parse IUPAC ‘R`

'R'.to_list
=> ["A", "G"]

Returns:

  • (Array)

    parsed nt bases



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/viral_seq/string.rb', line 65

def to_list
  list = []
  case self.upcase
  when /[A|T|C|G]/
    list << self
  when "W"
    list = ['A','T']
  when "S"
    list = ['C','G']
  when "M"
    list = ['A','C']
  when 'K'
    list = ['G','C']
  when 'R'
    list = ['A','G']
  when 'Y'
    list = ['C','T']
  when 'B'
    list = ['C','G','T']
  when 'D'
    list = ['A','G','T']
  when 'H'
    list = ['A','C','T']
  when 'V'
    list = ['A','C','G']
  when 'N'
    list = ['A','T','C','G']
  end
  return list
end