Class: Bioroebe::RawSequence

Inherits:
Object
  • Object
show all
Defined in:
lib/bioroebe/raw_sequence/raw_sequence.rb

Overview

Bioroebe::RawSequence

Direct Known Subclasses

Sequence

Instance Method Summary collapse

Constructor Details

#initialize(commandline_arguments = ARGV) ⇒ RawSequence

#

initialize

#

21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 21

def initialize(
    commandline_arguments = ARGV
  )
  reset
  if commandline_arguments and
     commandline_arguments.is_a?(Array) and
     !commandline_arguments.empty?
    set_raw_sequence(commandline_arguments)
  elsif commandline_arguments and
        commandline_arguments.is_a?(String)
    set_raw_sequence(commandline_arguments)
  end
end

Instance Method Details

#+(i) ⇒ Object

#

+

This method can “combine” - aka add - two sequences to one another.

#

113
114
115
116
117
118
119
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 113

def +(i)
  if i.is_a?(Bioroebe::RawSequence) or
     i.respond_to?(:sequence?) # This line will match for Bioroebe::Sequence
    return @sequence+
           i.sequence?
  end
end

#<<(i) ⇒ Object Also known as: add, append, concat

#

<<

The method called << is an “input method”, that is, it will simply append onto the main sequence (stored as @sequence).

In simpler words: the @sequence stores the DNA or RNA or aminoacid sequence.

If a Sequence object is passed (Bioroebe::Sequence) then this method will tap into the main sequence (the main String) that it stores, through the .sequence? method, before continuing.

#

450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 450

def <<(i)
  if i.is_a?(::Bioroebe::Sequence) or i.is_a?(::Bioroebe::Sequence)
    i = i.sequence?
  elsif i.is_a? Symbol
    case i
    # ===================================================================== #
    # === :stop
    # ===================================================================== #
    when :stop
      if Bioroebe.stop_codons.empty?
        Bioroebe.initialize_default_stop_codons
      end
      i = ::Bioroebe.stop_codons?.sample
    end
  end
  @sequence << i
  self # Returning self here since that will allow method-chaining.
end

#[]=(start_position, end_position, new_content = '') ⇒ Object

#

[]=

Note that we will start to count at 1 here, since we also start at the first nucleotide position in a given DNA/RNA strand.

We will, however had, NOT do so when a negative number is passed to this method.

#

481
482
483
484
485
486
487
488
489
490
491
492
493
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 481

def []=(
    start_position,
    end_position,
    new_content = ''
  )
  start_position = start_position.to_i
  end_position   = end_position.to_i
  unless start_position < 0
    start_position -= 1 unless start_position < 1
    end_position   -= 1 unless end_position   < 1
  end
  @sequence[start_position, end_position] = new_content
end

#calculate_levensthein_distance(a, b = sequence?) ) ⇒ Object

#

calculate_levensthein_distance

#

498
499
500
501
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 498

def calculate_levensthein_distance(a, b = sequence?)
  require 'bioroebe/calculate/calculate_levensthein_distance.rb'
  ::Bioroebe.calculate_levensthein_distance(a,b)
end

#chars?Boolean Also known as: chars

#

chars?

This method will return the characters of the main sequence, as an Array.

#

Returns:

  • (Boolean)

83
84
85
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 83

def chars?
  @sequence.chars
end

#complement(i = @sequence) ⇒ Object

#

complement

#

199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 199

def complement(
    i = @sequence
  )
  _ = ''.dup
  i.chars.each {|this_char|
    case this_char
    when 'G'
      _ << 'C'
    when 'C'
      _ << 'G'
    when 'A'
      _ << 'T'
    when 'T'
      _ << 'A'
    end
  }
  _
end

#composition?Boolean Also known as: composition

#

composition

This method will return a hash displaying the nucleotide or aminoacid composition of the sequence at hand.

Usage example:

seq = Bioroebe::Sequence.new("ATGC"); seq.composition # => {"A"=>1, "T"=>1, "C"=>1, "G"=>1}
seq = Bioroebe::Sequence.new("EFGGHHGG"); seq.is_a_protein_now; seq.composition # => {"A"=>1, "T"=>1, "C"=>1, "G"=>1}
#

Returns:

  • (Boolean)

149
150
151
152
153
154
155
156
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 149

def composition?
  hash = {} # This Hash will be returned for all the three cases defined below.
  available_keys = @sequence.chars.uniq
  available_keys.each {|this_key|
    hash[this_key] = @sequence.count(this_key)
  }
  return hash
end

#count(this_character) ⇒ Object

#

count

#

168
169
170
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 168

def count(this_character)
  @sequence.count(this_character)
end

#delete(i) ⇒ Object

#

delete

#

104
105
106
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 104

def delete(i)
  @sequence.delete(i)
end

#delete!(i) ⇒ Object

#

delete!

#

321
322
323
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 321

def delete!(i)
  @sequence.delete!(i)
end

#downcaseObject Also known as: lowercase, lower

#

downcase

This method will always downcase our given sequence object at hand.

.lower() has been added in September 2021 for (slight) compatibility towards biopython.

#

244
245
246
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 244

def downcase
  @sequence.downcase! # Will always modify.
end

#each_char(&block) ⇒ Object

#

each_char

#

97
98
99
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 97

def each_char(&block)
  @sequence.each_char(&block)
end

#empty?Boolean

#

empty?

Determine whether our sequence is empty or not. It is empty if it is a String of zero length, an “empty” String such as ''.

#

Returns:

  • (Boolean)

192
193
194
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 192

def empty?
  @sequence.empty?
end

#find_substring_indices(this_substring) ⇒ Object Also known as: find_this_subsequence

#

find_substring_indices

This method taps into the method called Bioroebe.find_substring().

It will return an Array of all substring indices (if we have found any, that is) - otherwise it will return nil.

#

530
531
532
533
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 530

def find_substring_indices(this_substring)
  require 'bioroebe/toplevel_methods/searching_and_finding.rb'
  return ::Bioroebe.find_substring_indices(string?, this_substring)
end

#first_position=(i) ⇒ Object Also known as: first_nucleotide=

#

first_position=

Use this method to assign a new sequence at the start. If this is DNA, then it is a new first nucleotide.

#

224
225
226
227
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 224

def first_position=(i)
  @sequence = @sequence.dup if @sequence.frozen? # Prevent frozen String error here.
  @sequence[0,1] = i
end

#freezeObject

#

freeze

If you wish to free the sequence object and thus disallow further modifications to it, use this method.

#

287
288
289
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 287

def freeze
  @sequence.freeze
end

#gsub(replace_this, with_that) ⇒ Object

#

gsub

#

294
295
296
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 294

def gsub(replace_this, with_that)
  @sequence.gsub(replace_this, with_that)
end

#gsub!(replace_this, with_that) ⇒ Object

#

gsub!

#

338
339
340
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 338

def gsub!(replace_this, with_that)
  @sequence.gsub!(replace_this, with_that)
end

#include?(i) ⇒ Boolean

#

include?

Check whether our sequence includes some other sequence.

#

Returns:

  • (Boolean)

126
127
128
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 126

def include?(i)
  @sequence.to_s.include? i.to_s
end

#insert_at_this_position(position, insert_this_new_content) ⇒ Object

#

insert_at_this_position

This method can be specifically used to insert content into a sequence object. For example, a His6-tag sequence into a DNA sequence object.

The second argument is the new (DNA, RNA or Aminoacid) sequence that you wish to add. You can also use '|' tokens there if you like to - they will be removed.

#

428
429
430
431
432
433
434
435
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 428

def insert_at_this_position(
    position, insert_this_new_content
  )
  if insert_this_new_content.include? '|'
    insert_this_new_content.delete!('|')
  end
  @sequence[position, 0] = insert_this_new_content
end

#prepend(i) ⇒ Object

#

prepend

If you wish to prepend something to your target sequence then this is the right method to use.

#

331
332
333
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 331

def prepend(i)
  @sequence.prepend(i)
end

#remove_n_characters_from_the_left_side(n_characters) ⇒ Object

#

remove_n_characters_from_the_left_side

This method will remove n characters from the left side (aka 5').

It can be applied to DNA, RNA and an aminoacid sequence, so it can be retained on the main Sequence class definition as-is.

#

406
407
408
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 406

def remove_n_characters_from_the_left_side(n_characters)
  @sequence[0, n_characters] = ''
end

#resetObject

#

reset (reset tag)

#

38
39
40
41
42
43
44
45
46
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 38

def reset
  # ======================================================================= #
  # === @sequence
  #
  # This instance variable keeps our whole sequence. It is the most
  # important variable for objects instantiated from this class.
  # ======================================================================= #
  @sequence = ''.dup
end

#reverseObject

#

reverse

#

90
91
92
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 90

def reverse
  @sequence.reverse
end

#reverse!Object

#

reverse!

#

161
162
163
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 161

def reverse!
  @sequence.reverse!
end

#reverse_complement(i = sequence?) ) ⇒ Object

#

reverse_complement

Complement to the other strand via this method here, which is actually called “reverse complement”.

The complement thus refers to the “complementary DNA strand”, towards a 5'-NUCLEOTIDE-3' sequence.

Usage example:

x = Bioroebe::Sequence.new('ATTGCCACAACTGAGACA'); x.complement # => "TGTCTCAGTTGTGGCAAT"
#

517
518
519
520
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 517

def reverse_complement(i = sequence?)
  require 'bioroebe/toplevel_methods/nucleotides.rb'
  return ::Bioroebe.complementary_dna_strand(i).reverse
end

#scan(i) ⇒ Object

#

scan

#

182
183
184
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 182

def scan(i)
  @sequence.scan(i)
end

#set_raw_sequence(i) ⇒ Object Also known as: assign

#

set_raw_sequence

#

72
73
74
75
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 72

def set_raw_sequence(i)
  i = i.flatten.compact.first if i.is_a? Array
  @sequence = i
end

#shuffleObject Also known as: randomize

#

shuffle

#

232
233
234
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 232

def shuffle
  @sequence = @sequence.chars.shuffle.join
end

#size?Boolean Also known as: size, length, length?

#

size?

Return the size of the string/sequence in question.

#

Returns:

  • (Boolean)

312
313
314
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 312

def size?
  @sequence.size
end

#split(i) ⇒ Object

#

split

#

175
176
177
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 175

def split(i)
  @sequence.split(i)
end

#start_with?(i) ⇒ Boolean

#

start_with?

#

Returns:

  • (Boolean)

133
134
135
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 133

def start_with?(i)
  to_s.start_with?(i)
end

#stripObject

#

strip

Similar to the method .strip() on class String.

#

303
304
305
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 303

def strip
  @sequence.strip
end

#subseq(start_position, end_position = :ask_the_user_for_an_end_position_number) ⇒ Object Also known as: [], subsequence, start_end

#

subseq

This method will obtain a subsequence of the given sequence object at hand.

We start to count at the first nucleotide. The second argument given to this method will denote the nucleotide position at where we will STOP. So (3,8) will translate to “take nucleotide 3, up to and including nucleotide 8, and then return this result”.

See the following examples to understand this more easily.

Usage examples:

seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(1, 3) # => "ATG"
seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(3, 8) # => "GCATGC"
seq = Bioroebe::RawSequence.new("atgcatgcaaaa"); seq.subseq(3, 8) # => "GCATGC"
seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(3, 833333333333) # => "GCATGCAAAA"
seq = Bioroebe::RawSequence.new("ATGCATGCAAATCCACAA"); seq.start_end(1, 10)  # => "ATGCATGCAA"
#

383
384
385
386
387
388
389
390
391
392
393
394
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 383

def subseq(
    start_position,
    end_position = :ask_the_user_for_an_end_position_number
  )
  if end_position == :ask_the_user_for_an_end_position_number
    puts 'Please provide a valid end position (an Integer value).'
  else
    start_position -= 1
    end_position -= start_position
    sequence?[start_position, end_position]
  end
end

#to_sObject Also known as: sequence?, sequence, string?, seq, seq?, s?, main_string?, main_sequence_as_string?

#

to_s

Query method over the given Sequence that this class stores, as a String.

This method has several aliases, but it can not be guaranteed that all aliases will continue to work for the remainder of this project's lifecycle. For example, the method s? as alias for sequence? may be removed one day - but until then, it will be remain available.

Still, it is recommended to use the slightly longer method name .sequence? or .to_s; the alias s? exists mostly so that we can be lazy in IRB and elsewhere. So perhaps it will be retained, but there is no guarantee - for your own scripts you should use either .to_s or .sequence? really.

If you wish to test the output of this method, try:

require 'bioroebe'; x = Bioroebe::Seq.new('AGTACACTGGT'); puts x
#

270
271
272
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 270

def to_s
  @sequence.to_s
end

#to_strObject

#

to_str

We need this method to allow to chain Sequence-objects together, in a String-like behaviour.

Specifically this allows us to make use of the '+' method call.

Objects in ruby implement the to_str method so that they can be treated like a String, for all practical purposes.

This can be tested like in this way:

x = Bioroebe::RawSequence.new('ATGGATCGATGC'); y = Bioroebe::RawSequence.new('TTTGATCGATGC'); z = x + y
#

64
65
66
67
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 64

def to_str
  # self # ← Old code since up to May 2020.
  @sequence.to_s # ← This became the new default as of May 2020 again.
end

#tr!(a, b) ⇒ Object

#

tr!

#

413
414
415
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 413

def tr!(a, b)
  @sequence.tr!(a, b)
end

#upcase!Object Also known as: upcase, up, upper

#

upcase!

This method will upcase the given sequence, so “atg” becomes “ATG”.

Note that .upcase() is an alias to .upcase!() - use whichever variant you want to, but keep in mind that the receiver will be modified in both variants.

.upper() has been added in September 2021 for (slight) compatibility towards biopython.

#

354
355
356
357
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 354

def upcase!
  @sequence.upcase!
  return @sequence
end