Class: Bio::Sequence::NA

Inherits:
String show all
Includes:
Common
Defined in:
lib/bio/sequence/na.rb,
lib/bio/sequence/compat.rb,
lib/bio/shell/plugin/midi.rb

Overview

TODO

- add "Ohno" style
- add a accessor to drum pattern
- add a new feature to select music style (pop, trans, ryukyu, ...)
- what is the base?

++

Direct Known Subclasses

RestrictionEnzyme::SingleStrand

Defined Under Namespace

Classes: MidiTrack

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Common

#+, #<<, #composition, #concat, #normalize!, #randomize, #seq, #splice, #subseq, #to_fasta, #to_s, #total, #window_search

Methods inherited from String

#fill, #fold, #skip, #step, #to_aaseq, #to_naseq

Constructor Details

#initialize(str) ⇒ NA

Generate an nucleic acid sequence object from a string.

s = Bio::Sequence::NA.new("aagcttggaccgttgaagt")

or maybe (if you have an nucleic acid sequence in a file)

s = Bio::Sequence:NA.new(File.open('dna.txt').read)

Nucleic Acid sequences are always all lowercase in bioruby

s = Bio::Sequence::NA.new("AAGcTtGG")
puts s                                  #=> "aagcttgg"

Whitespace is stripped from the sequence

seq = Bio::Sequence::NA.new("atg\nggg\ttt\r  gc")
puts s                                  #=> "atggggttgc"

Arguments:

  • (required) str: String

Returns

Bio::Sequence::NA object


77
78
79
80
81
# File 'lib/bio/sequence/na.rb', line 77

def initialize(str)
  super
  self.downcase!
  self.tr!(" \t\n\r",'')
end

Class Method Details

.randomize(*arg, &block) ⇒ Object

Generate a new random sequence with the given frequency of bases. The sequence length is determined by their cumulative sum. (See also Bio::Sequence::Common#randomize which creates a new randomized sequence object using the base composition of an existing sequence instance).

counts = {'a'=>1,'c'=>2,'g'=>3,'t'=>4}
puts Bio::Sequence::NA.randomize(counts)  #=> "ggcttgttac" (for example)

You may also feed the output of randomize into a block

actual_counts = {'a'=>0, 'c'=>0, 'g'=>0, 't'=>0}
Bio::Sequence::NA.randomize(counts) {|x| actual_counts[x] += 1}
actual_counts                     #=> {"a"=>1, "c"=>2, "g"=>3, "t"=>4}

Arguments:

  • (optional) hash: Hash object

Returns

Bio::Sequence::NA object


87
88
89
# File 'lib/bio/sequence/compat.rb', line 87

def self.randomize(*arg, &block)
  self.new('').randomize(*arg, &block)
end

Instance Method Details

#at_contentObject

Calculate the ratio of AT / ATGC bases. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.at_content                       #=> 0.444444444444444

Returns

Float


319
320
321
322
323
324
325
# File 'lib/bio/sequence/na.rb', line 319

def at_content
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  return 0.0 if at + gc == 0
  return at.quo(at + gc)
end

#at_skewObject

Calculate the ratio of (A - T) / (A + T) bases. U is regarded as T.

s = Bio::Sequence::NA.new('atgttgttgttc')
puts s.at_skew                          #=> -0.75

Returns

Float


347
348
349
350
351
352
353
# File 'lib/bio/sequence/na.rb', line 347

def at_skew
  count = self.composition
  a = count['a']
  t = count['t'] + count['u']
  return 0.0 if a + t == 0
  return (a - t).quo(a + t)
end

#codon_usageObject

Returns counts of each codon in the sequence in a hash.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.codon_usage                #=> {"gcg"=>1, "tga"=>1, "atg"=>1}

This method does not validate codons! Any three letter group is a 'codon'. So,

s = Bio::Sequence::NA.new('atggNNtga')
puts s.codon_usage                #=> {"tga"=>1, "gnn"=>1, "atg"=>1}

seq = Bio::Sequence::NA.new('atgg--tga')
puts s.codon_usage                #=> {"tga"=>1, "g--"=>1, "atg"=>1}

Also, there is no option to work in any frame other than the first.


Returns

Hash object


275
276
277
278
279
280
281
# File 'lib/bio/sequence/na.rb', line 275

def codon_usage
  hash = Hash.new(0)
  self.window_search(3, 3) do |codon|
    hash[codon] += 1
  end
  return hash
end

#cut_with_enzyme(*args) ⇒ Object Also known as: cut_with_enzymes

Example:

seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('EcoRI')

or

seq = Bio::Sequence::NA.new('gaattc')
cuts = seq.cut_with_enzyme('g^aattc')

See Bio::RestrictionEnzyme::Analysis.cut


481
482
483
# File 'lib/bio/sequence/na.rb', line 481

def cut_with_enzyme(*args)
  Bio::RestrictionEnzyme::Analysis.cut(self, *args)
end

#dnaObject

Returns a new sequence object with any 'u' bases changed to 't'. The original sequence is not modified.

s = Bio::Sequence::NA.new('augc')
puts s.dna                              #=> 'atgc'
puts s                                  #=> 'augc'

Returns

new Bio::Sequence::NA object


425
426
427
# File 'lib/bio/sequence/na.rb', line 425

def dna
  self.tr('u', 't')
end

#dna!Object

Changes any 'u' bases in the original sequence to 't'. The original sequence is modified.

s = Bio::Sequence::NA.new('augc')
puts s.dna!                             #=> 'atgc'
puts s                                  #=> 'atgc'

Returns

current Bio::Sequence::NA object (modified)


437
438
439
# File 'lib/bio/sequence/na.rb', line 437

def dna!
  self.tr!('u', 't')
end

#forward_complementObject

Returns a new complementary sequence object (without reversing). The original sequence object is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.forward_complement               #=> 'tacg'
puts s                                  #=> 'atgc'

Returns

new Bio::Sequence::NA object


102
103
104
105
106
# File 'lib/bio/sequence/na.rb', line 102

def forward_complement
  s = self.class.new(self)
  s.forward_complement!
  s
end

#forward_complement!Object

Converts the current sequence into its complement (without reversing). The original sequence object is modified.

seq = Bio::Sequence::NA.new('atgc')
puts s.forward_complement!              #=> 'tacg'
puts s                                  #=> 'tacg'

Returns

current Bio::Sequence::NA object (modified)


116
117
118
119
120
121
122
123
# File 'lib/bio/sequence/na.rb', line 116

def forward_complement!
  if self.rna?
    self.tr!('augcrymkdhvbswn', 'uacgyrkmhdbvswn')
  else
    self.tr!('atgcrymkdhvbswn', 'tacgyrkmhdbvswn')
  end
  self
end

#gc_contentObject

Calculate the ratio of GC / ATGC bases. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_content                       #=> 0.555555555555556

Returns

Float


305
306
307
308
309
310
311
# File 'lib/bio/sequence/na.rb', line 305

def gc_content
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  return 0.0 if at + gc == 0
  return gc.quo(at + gc)
end

#gc_percentObject

Calculate the ratio of GC / ATGC bases as a percentage rounded to the nearest whole number. U is regarded as T.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_percent                       #=> 55

Returns

Fixnum


290
291
292
293
294
295
296
297
# File 'lib/bio/sequence/na.rb', line 290

def gc_percent
  count = self.composition
  at = count['a'] + count['t'] + count['u']
  gc = count['g'] + count['c']
  return 0 if at + gc == 0
  gc = 100 * gc / (at + gc)
  return gc
end

#gc_skewObject

Calculate the ratio of (G - C) / (G + C) bases.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.gc_skew                          #=> 0.6

Returns

Float


333
334
335
336
337
338
339
# File 'lib/bio/sequence/na.rb', line 333

def gc_skew
  count = self.composition
  g = count['g']
  c = count['c']
  return 0.0 if g + c == 0
  return (g - c).quo(g + c)
end

#illegal_basesObject

Returns an alphabetically sorted array of any non-standard bases (other than 'atgcu').

s = Bio::Sequence::NA.new('atgStgQccR')
puts s.illegal_bases                    #=> ["q", "r", "s"]

Returns

Array object


362
363
364
# File 'lib/bio/sequence/na.rb', line 362

def illegal_bases
  self.scan(/[^atgcu]/).sort.uniq
end

#molecular_weightObject

Estimate molecular weight (using the values from BioPerl's SeqStats.pm module).

s = Bio::Sequence::NA.new('atggcgtga')
puts s.molecular_weight                 #=> 2841.00708

RNA and DNA do not have the same molecular weights,

s = Bio::Sequence::NA.new('auggcguga')
puts s.molecular_weight                 #=> 2956.94708

Returns

Float object


378
379
380
381
382
383
384
# File 'lib/bio/sequence/na.rb', line 378

def molecular_weight
  if self.rna?
    Bio::NucleicAcid.weight(self, true)
  else
    Bio::NucleicAcid.weight(self)
  end
end

#namesObject

Generate the list of the names of each nucleotide along with the sequence (full name). Names used in bioruby are found in the Bio::AminoAcid::NAMES hash.

s = Bio::Sequence::NA.new('atg')
puts s.names                    #=> ["Adenine", "Thymine", "Guanine"]

Returns

Array object


409
410
411
412
413
414
415
# File 'lib/bio/sequence/na.rb', line 409

def names
  array = []
  self.each_byte do |x|
    array.push(Bio::NucleicAcid.names[x.chr.upcase])
  end
  return array
end

#pikachuObject

:nodoc:


91
92
93
# File 'lib/bio/sequence/compat.rb', line 91

def pikachu #:nodoc:
  self.dna.tr("atgc", "pika") # joke, of course :-)
end

#reverse_complementObject Also known as: complement

Returns a new sequence object with the reverse complement sequence to the original. The original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement               #=> 'gcat'
puts s                                  #=> 'atgc'

Returns

new Bio::Sequence::NA object


133
134
135
136
137
# File 'lib/bio/sequence/na.rb', line 133

def reverse_complement
  s = self.class.new(self)
  s.reverse_complement!
  s
end

#reverse_complement!Object Also known as: complement!

Converts the original sequence into its reverse complement.

The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
puts s.reverse_complement               #=> 'gcat'
puts s                                  #=> 'gcat'

Returns

current Bio::Sequence::NA object (modified)


147
148
149
150
# File 'lib/bio/sequence/na.rb', line 147

def reverse_complement!
  self.reverse!
  self.forward_complement!
end

#rnaObject

Returns a new sequence object with any 't' bases changed to 'u'. The original sequence is not modified.

s = Bio::Sequence::NA.new('atgc')
puts s.dna                              #=> 'augc'  
puts s                                  #=> 'atgc'

Returns

new Bio::Sequence::NA object


449
450
451
# File 'lib/bio/sequence/na.rb', line 449

def rna
  self.tr('t', 'u')
end

#rna!Object

Changes any 't' bases in the original sequence to 'u'. The original sequence is modified.

s = Bio::Sequence::NA.new('atgc')
puts s.dna!                             #=> 'augc'
puts s                                  #=> 'augc'

Returns

current Bio::Sequence::NA object (modified)


461
462
463
# File 'lib/bio/sequence/na.rb', line 461

def rna!
  self.tr!('t', 'u')
end

#splicing(position) ⇒ Object

Alias of Bio::Sequence::Common splice method, documented there.


84
85
86
87
88
89
90
91
92
# File 'lib/bio/sequence/na.rb', line 84

def splicing(position) #:nodoc:
  mRNA = super
  if mRNA.rna?
    mRNA.tr!('t', 'u')
  else
    mRNA.tr!('u', 't')
  end
  mRNA
end

#to_midi(style = {}, drum = true) ⇒ Object

style:

Hash of :tempo, :scale, :tones

scale:

C  C# D  D# E  F  F# G  G# A  A#  B
0  1  2  3  4  5  6  7  8  9  10  11

tones:

Hash of :prog, :base, :range -- tone, vol? or len?, octaves

drum:

true (with rhythm part), false (without rhythm part)

351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
# File 'lib/bio/shell/plugin/midi.rb', line 351

def to_midi(style = {}, drum = true)
  default = MidiTrack::Styles["Ichinose"]
  if style.is_a?(String)
    style = MidiTrack::Styles[style] || default
  end
  tempo = style[:tempo] || default[:tempo]
  scale = style[:scale] || default[:scale]
  tones = style[:tones] || default[:tones]

  track = []

  tones.each_with_index do |tone, i|
    ch = i
    ch += 1 if i >= 9         # skip rythm track
    track.push MidiTrack.new(ch, tone[:prog], tone[:base], tone[:range], scale)
  end

  if drum
    rhythm = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
    track.push(MidiTrack.new(9, 0, 35, 2, rhythm))
  end

  cur = 0
  window_search(4) do |s|
    track[cur % track.length].push(s)
    cur += 1
  end

  track.each do |t|
    t.push_silent(12)
  end

  ans = track[0].header(track.length, tempo)
  track.each do |t|
    ans += t.encode
  end
  return ans
end

#to_reObject

Create a ruby regular expression instance (Regexp)

s = Bio::Sequence::NA.new('atggcgtga')
puts s.to_re                            #=> /atggcgtga/

Returns

Regexp object


393
394
395
396
397
398
399
# File 'lib/bio/sequence/na.rb', line 393

def to_re
  if self.rna?
    Bio::NucleicAcid.to_re(self.dna, true)
  else
    Bio::NucleicAcid.to_re(self)
  end
end

#translate(frame = 1, table = 1, unknown = 'X') ⇒ Object

Translate into an amino acid sequence.

s = Bio::Sequence::NA.new('atggcgtga')
puts s.translate                        #=> "MA*"

By default, translate starts in reading frame position 1, but you can start in either 2 or 3 as well,

puts s.translate(2)                     #=> "WR"
puts s.translate(3)                     #=> "GV"

You may also translate the reverse complement in one step by using frame values of -1, -2, and -3 (or 4, 5, and 6)

puts s.translate(-1)                    #=> "SRH"
puts s.translate(4)                     #=> "SRH"
puts s.reverse_complement.translate(1)  #=> "SRH"

The default codon table in the translate function is the Standard Eukaryotic codon table. The translate function takes either a number or a Bio::CodonTable object for its table argument. The available tables are (NCBI):

1. "Standard (Eukaryote)"
2. "Vertebrate Mitochondrial"
3. "Yeast Mitochondorial"
4. "Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma"
5. "Invertebrate Mitochondrial"
6. "Ciliate Macronuclear and Dasycladacean"
9. "Echinoderm Mitochondrial"
10. "Euplotid Nuclear"
11. "Bacteria"
12. "Alternative Yeast Nuclear"
13. "Ascidian Mitochondrial"
14. "Flatworm Mitochondrial"
15. "Blepharisma Macronuclear"
16. "Chlorophycean Mitochondrial"
21. "Trematode Mitochondrial"
22. "Scenedesmus obliquus mitochondrial"
23. "Thraustochytrium Mitochondrial"

If you are using anything other than the default table, you must specify frame in the translate method call,

puts s.translate                #=> "MA*"  (using defaults)
puts s.translate(1,1)           #=> "MA*"  (same as above, but explicit)
puts s.translate(1,2)           #=> "MAW"  (different codon table)

and using a Bio::CodonTable instance in the translate method call,

mt_table = Bio::CodonTable[2]
puts s.translate(1, mt_table)           #=> "MAW"

By default, any invalid or unknown codons (as could happen if the sequence contains ambiguities) will be represented by 'X' in the translated sequence. You may change this to any character of your choice.

s = Bio::Sequence::NA.new('atgcNNtga')
puts s.translate                        #=> "MX*"
puts s.translate(1,1,'9')               #=> "M9*"

The translate method considers gaps to be unknown characters and treats them as such (i.e. does not collapse sequences prior to translation), so

s = Bio::Sequence::NA.new('atgc--tga')
puts s.translate                        #=> "MX*"

Arguments:

  • (optional) frame: one of 1,2,3,4,5,6,-1,-2,-3 (default 1)

  • (optional) table: Fixnum in range 1,23 or Bio::CodonTable object (default 1)

  • (optional) unknown: Character (default 'X')

Returns

Bio::Sequence::AA object


234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/bio/sequence/na.rb', line 234

def translate(frame = 1, table = 1, unknown = 'X')
  if table.is_a?(Bio::CodonTable)
    ct = table
  else
    ct = Bio::CodonTable[table]
  end
  naseq = self.dna
  case frame
  when 1, 2, 3
    from = frame - 1
  when 4, 5, 6
    from = frame - 4
    naseq.complement!
  when -1, -2, -3
    from = -1 - frame
    naseq.complement!
  else
    from = 0
  end
  nalen = naseq.length - from
  nalen -= nalen % 3
  aaseq = naseq[from, nalen].gsub(/.{3}/) {|codon| ct[codon] or unknown}
  return Bio::Sequence::AA.new(aaseq)
end