Class: Bioroebe::RawSequence

Inherits:
Object
  • Object
show all
Defined in:
lib/bioroebe/raw_sequence/raw_sequence.rb

Overview

Bioroebe::RawSequence

Direct Known Subclasses

Sequence

Instance Method Summary collapse

Constructor Details

#initialize(commandline_arguments = ARGV) ⇒ RawSequence

#

initialize

#


23
24
25
26
27
28
29
30
31
32
33
34
35
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 23

def initialize(
    commandline_arguments = ARGV
  )
  reset
  if commandline_arguments and
     commandline_arguments.is_a?(Array) and
     !commandline_arguments.empty?
    set_raw_sequence(commandline_arguments)
  elsif commandline_arguments and
        commandline_arguments.is_a?(String)
    set_raw_sequence(commandline_arguments)
  end
end

Instance Method Details

#+(i) ⇒ Object

#

+

This method can “combine” - aka add - two sequences to one another.

#


115
116
117
118
119
120
121
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 115

def +(i)
  if i.is_a?(Bioroebe::RawSequence) or
     i.respond_to?(:sequence?) # This line will match for Bioroebe::Sequence
    return @sequence+
           i.sequence?
  end
end

#<<(i) ⇒ Object Also known as: add, append, concat

#

<<

The method called << is an “input method”, that is, it will simply append onto the main sequence (stored as @sequence).

In simpler words: the @sequence stores the DNA or RNA or aminoacid sequence.

If a Sequence object is passed (Bioroebe::Sequence) then this method will tap into the main sequence (the main String) that it stores, through the .sequence? method, before continuing.

#


452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 452

def <<(i)
  if i.is_a?(::Bioroebe::Sequence) or i.is_a?(::Bioroebe::Sequence)
    i = i.sequence?
  elsif i.is_a? Symbol
    case i
    # ===================================================================== #
    # === :stop
    # ===================================================================== #
    when :stop
      if Bioroebe.stop_codons.empty?
        Bioroebe.initialize_default_stop_codons
      end
      i = ::Bioroebe.stop_codons?.sample
    end
  end
  @sequence << i
  self # Returning self here since that will allow method-chaining.
end

#[]=(start_position, end_position, new_content = '') ⇒ Object

#

[]=

Note that we will start to count at 1 here, since we also start at the first nucleotide position in a given DNA/RNA strand.

We will, however had, NOT do so when a negative number is passed to this method.

#


483
484
485
486
487
488
489
490
491
492
493
494
495
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 483

def []=(
    start_position,
    end_position,
    new_content = ''
  )
  start_position = start_position.to_i
  end_position   = end_position.to_i
  unless start_position < 0
    start_position -= 1 unless start_position < 1
    end_position   -= 1 unless end_position   < 1
  end
  @sequence[start_position, end_position] = new_content
end

#calculate_levensthein_distance(a, b = sequence?) ) ⇒ Object

#

calculate_levensthein_distance

#


500
501
502
503
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 500

def calculate_levensthein_distance(a, b = sequence?)
  require 'bioroebe/calculate/calculate_levensthein_distance.rb'
  ::Bioroebe.calculate_levensthein_distance(a,b)
end

#chars?Boolean Also known as: chars

#

chars?

This method will return the characters of the main sequence, as an Array.

#

Returns:

  • (Boolean)


85
86
87
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 85

def chars?
  @sequence.chars
end

#complement(i = @sequence) ⇒ Object

#

complement

#


201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 201

def complement(
    i = @sequence
  )
  _ = ''.dup
  i.chars.each {|this_char|
    case this_char
    when 'G'
      _ << 'C'
    when 'C'
      _ << 'G'
    when 'A'
      _ << 'T'
    when 'T'
      _ << 'A'
    end
  }
  _
end

#composition?Boolean Also known as: composition

#

composition

This method will return a hash displaying the nucleotide or aminoacid composition of the sequence at hand.

Usage example:

seq = Bioroebe::Sequence.new("ATGC"); seq.composition # => {"A"=>1, "T"=>1, "C"=>1, "G"=>1}
seq = Bioroebe::Sequence.new("EFGGHHGG"); seq.is_a_protein_now; seq.composition # => {"A"=>1, "T"=>1, "C"=>1, "G"=>1}
#

Returns:

  • (Boolean)


151
152
153
154
155
156
157
158
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 151

def composition?
  hash = {} # This Hash will be returned for all the three cases defined below.
  available_keys = @sequence.chars.uniq
  available_keys.each {|this_key|
    hash[this_key] = @sequence.count(this_key)
  }
  return hash
end

#count(this_character) ⇒ Object

#

count

#


170
171
172
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 170

def count(this_character)
  @sequence.count(this_character)
end

#delete(i) ⇒ Object

#

delete

#


106
107
108
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 106

def delete(i)
  @sequence.delete(i)
end

#delete!(i) ⇒ Object

#

delete!

#


323
324
325
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 323

def delete!(i)
  @sequence.delete!(i)
end

#downcaseObject Also known as: lowercase, lower

#

downcase

This method will always downcase our given sequence object at hand.

.lower() has been added in September 2021 for (slight) compatibility towards biopython.

#


246
247
248
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 246

def downcase
  @sequence.downcase! # Will always modify.
end

#each_char(&block) ⇒ Object

#

each_char

#


99
100
101
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 99

def each_char(&block)
  @sequence.each_char(&block)
end

#empty?Boolean

#

empty?

Determine whether our sequence is empty or not. It is empty if it is a String of zero length, an “empty” String such as ”.

#

Returns:

  • (Boolean)


194
195
196
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 194

def empty?
  @sequence.empty?
end

#find_substring_indices(this_substring) ⇒ Object Also known as: find_this_subsequence

#

find_substring_indices

This method taps into the method called Bioroebe.find_substring().

It will return an Array of all substring indices (if we have found any, that is) - otherwise it will return nil.

#


532
533
534
535
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 532

def find_substring_indices(this_substring)
  require 'bioroebe/toplevel_methods/searching_and_finding.rb'
  return ::Bioroebe.find_substring_indices(string?, this_substring)
end

#first_position=(i) ⇒ Object Also known as: first_nucleotide=

#

first_position=

Use this method to assign a new sequence at the start. If this is DNA, then it is a new first nucleotide.

#


226
227
228
229
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 226

def first_position=(i)
  @sequence = @sequence.dup if @sequence.frozen? # Prevent frozen String error here.
  @sequence[0,1] = i
end

#freezeObject

#

freeze

If you wish to free the sequence object and thus disallow further modifications to it, use this method.

#


289
290
291
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 289

def freeze
  @sequence.freeze
end

#gsub(replace_this, with_that) ⇒ Object

#

gsub

#


296
297
298
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 296

def gsub(replace_this, with_that)
  @sequence.gsub(replace_this, with_that)
end

#gsub!(replace_this, with_that) ⇒ Object

#

gsub!

#


340
341
342
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 340

def gsub!(replace_this, with_that)
  @sequence.gsub!(replace_this, with_that)
end

#include?(i) ⇒ Boolean

#

include?

Check whether our sequence includes some other sequence.

#

Returns:

  • (Boolean)


128
129
130
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 128

def include?(i)
  @sequence.to_s.include? i.to_s
end

#insert_at_this_position(position, insert_this_new_content) ⇒ Object

#

insert_at_this_position

This method can be specifically used to insert content into a sequence object. For example, a His6-tag sequence into a DNA sequence object.

The second argument is the new (DNA, RNA or Aminoacid) sequence that you wish to add. You can also use ‘|’ tokens there if you like to - they will be removed.

#


430
431
432
433
434
435
436
437
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 430

def insert_at_this_position(
    position, insert_this_new_content
  )
  if insert_this_new_content.include? '|'
    insert_this_new_content.delete!('|')
  end
  @sequence[position, 0] = insert_this_new_content
end

#prepend(i) ⇒ Object

#

prepend

If you wish to prepend something to your target sequence then this is the right method to use.

#


333
334
335
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 333

def prepend(i)
  @sequence.prepend(i)
end

#remove_n_characters_from_the_left_side(n_characters) ⇒ Object

#

remove_n_characters_from_the_left_side

This method will remove n characters from the left side (aka 5’).

It can be applied to DNA, RNA and an aminoacid sequence, so it can be retained on the main Sequence class definition as-is.

#


408
409
410
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 408

def remove_n_characters_from_the_left_side(n_characters)
  @sequence[0, n_characters] = ''
end

#resetObject

#

reset (reset tag)

#


40
41
42
43
44
45
46
47
48
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 40

def reset
  # ======================================================================= #
  # === @sequence
  #
  # This instance variable keeps our whole sequence. It is the most
  # important variable for objects instantiated from this class.
  # ======================================================================= #
  @sequence = ''.dup
end

#reverseObject

#

reverse

#


92
93
94
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 92

def reverse
  @sequence.reverse
end

#reverse!Object

#

reverse!

#


163
164
165
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 163

def reverse!
  @sequence.reverse!
end

#reverse_complement(i = sequence?) ) ⇒ Object

#

reverse_complement

Complement to the other strand via this method here, which is actually called “reverse complement”.

The complement thus refers to the “complementary DNA strand”, towards a 5’-NUCLEOTIDE-3’ sequence.

Usage example:

x = Bioroebe::Sequence.new('ATTGCCACAACTGAGACA'); x.complement # => "TGTCTCAGTTGTGGCAAT"
#


519
520
521
522
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 519

def reverse_complement(i = sequence?)
  require 'bioroebe/toplevel_methods/nucleotides.rb'
  return ::Bioroebe.complementary_dna_strand(i).reverse
end

#scan(i) ⇒ Object

#

scan

#


184
185
186
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 184

def scan(i)
  @sequence.scan(i)
end

#set_raw_sequence(i) ⇒ Object Also known as: assign

#

set_raw_sequence

#


74
75
76
77
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 74

def set_raw_sequence(i)
  i = i.flatten.compact.first if i.is_a? Array
  @sequence = i
end

#shuffleObject Also known as: randomize

#

shuffle

#


234
235
236
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 234

def shuffle
  @sequence = @sequence.chars.shuffle.join
end

#size?Boolean Also known as: size, length, length?

#

size?

Return the size of the string/sequence in question.

#

Returns:

  • (Boolean)


314
315
316
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 314

def size?
  @sequence.size
end

#split(i) ⇒ Object

#

split

#


177
178
179
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 177

def split(i)
  @sequence.split(i)
end

#start_with?(i) ⇒ Boolean

#

start_with?

#

Returns:

  • (Boolean)


135
136
137
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 135

def start_with?(i)
  to_s.start_with?(i)
end

#stripObject

#

strip

Similar to the method .strip() on class String.

#


305
306
307
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 305

def strip
  @sequence.strip
end

#subseq(start_position, end_position = :ask_the_user_for_an_end_position_number) ⇒ Object Also known as: [], subsequence, start_end

#

subseq

This method will obtain a subsequence of the given sequence object at hand.

We start to count at the first nucleotide. The second argument given to this method will denote the nucleotide position at where we will STOP. So (3,8) will translate to “take nucleotide 3, up to and including nucleotide 8, and then return this result”.

See the following examples to understand this more easily.

Usage examples:

seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(1, 3) # => "ATG"
seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(3, 8) # => "GCATGC"
seq = Bioroebe::RawSequence.new("atgcatgcaaaa"); seq.subseq(3, 8) # => "GCATGC"
seq = Bioroebe::RawSequence.new("ATGCATGCAAAA"); seq.subseq(3, 833333333333) # => "GCATGCAAAA"
seq = Bioroebe::RawSequence.new("ATGCATGCAAATCCACAA"); seq.start_end(1, 10)  # => "ATGCATGCAA"
#


385
386
387
388
389
390
391
392
393
394
395
396
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 385

def subseq(
    start_position,
    end_position = :ask_the_user_for_an_end_position_number
  )
  if end_position == :ask_the_user_for_an_end_position_number
    puts 'Please provide a valid end position (an Integer value).'
  else
    start_position -= 1
    end_position -= start_position
    sequence?[start_position, end_position]
  end
end

#to_sObject Also known as: sequence?, sequence, string?, seq, seq?, s?, main_string?, main_sequence_as_string?

#

to_s

Query method over the given Sequence that this class stores, as a String.

This method has several aliases, but it can not be guaranteed that all aliases will continue to work for the remainder of this project’s lifecycle. For example, the method s? as alias for sequence? may be removed one day - but until then, it will be remain available.

Still, it is recommended to use the slightly longer method name .sequence? or .to_s; the alias s? exists mostly so that we can be lazy in IRB and elsewhere. So perhaps it will be retained, but there is no guarantee - for your own scripts you should use either .to_s or .sequence? really.

If you wish to test the output of this method, try:

require 'bioroebe'; x = Bioroebe::Seq.new('AGTACACTGGT'); puts x
#


272
273
274
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 272

def to_s
  @sequence.to_s
end

#to_strObject

#

to_str

We need this method to allow to chain Sequence-objects together, in a String-like behaviour.

Specifically this allows us to make use of the ‘+’ method call.

Objects in ruby implement the to_str method so that they can be treated like a String, for all practical purposes.

This can be tested like in this way:

x = Bioroebe::RawSequence.new('ATGGATCGATGC'); y = Bioroebe::RawSequence.new('TTTGATCGATGC'); z = x + y
#


66
67
68
69
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 66

def to_str
  # self # ← Old code since up to May 2020.
  @sequence.to_s # ← This became the new default as of May 2020 again.
end

#tr!(a, b) ⇒ Object

#

tr!

#


415
416
417
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 415

def tr!(a, b)
  @sequence.tr!(a, b)
end

#upcase!Object Also known as: upcase, up, upper

#

upcase!

This method will upcase the given sequence, so “atg” becomes “ATG”.

Note that .upcase() is an alias to .upcase!() - use whichever variant you want to, but keep in mind that the receiver will be modified in both variants.

.upper() has been added in September 2021 for (slight) compatibility towards biopython.

#


356
357
358
359
# File 'lib/bioroebe/raw_sequence/raw_sequence.rb', line 356

def upcase!
  @sequence.upcase!
  return @sequence
end