Class: BioDSL::Seq
- Inherits:
-
Object
- Object
- BioDSL::Seq
- Defined in:
- lib/BioDSL/seq.rb
Overview
Class for manipulating sequences.
Defined Under Namespace
Classes: Orf
Constant Summary collapse
- DNA =
Residue alphabets
%w(a t c g)
- RNA =
%w(a u c g)
- PROTEIN =
%w(f l s y c w p h q r i m t n k v a d e g)
- INDELS =
%w(. - _ ~)
- SCORE_BASE =
Quality scores bases
33
- SCORE_MIN =
0
- SCORE_MAX =
40
Constants included from BackTrack
BackTrack::MAX_DEL, BackTrack::MAX_INS, BackTrack::MAX_MIS, BackTrack::OK_PATTERN
Constants included from Translate
Translate::TRANS_TAB11, Translate::TRANS_TAB11_START
Instance Attribute Summary collapse
-
#qual ⇒ Object
Returns the value of attribute qual.
-
#seq ⇒ Object
Returns the value of attribute seq.
-
#seq_name ⇒ Object
Returns the value of attribute seq_name.
-
#type ⇒ Object
Returns the value of attribute type.
Class Method Summary collapse
- .check_name_pair(entry1, entry2) ⇒ Object
-
.generate_oligos(length, type) ⇒ Object
Class method that generates all possible oligos of a specifed length and type.
-
.new_bp(record) ⇒ Object
Class method to instantiate a new Sequence object given a Biopiece record.
Instance Method Summary collapse
-
#+(other) ⇒ Object
Method to add two Seq objects.
-
#<<(entry) ⇒ Object
Method to concatenate sequence entries.
-
#[](*args) ⇒ Object
Index method for Seq objects.
-
#[]=(*args, entry) ⇒ Object
Index assignment method for Seq objects.
-
#complement ⇒ Object
Method that complements sequence including ambiguity codes.
-
#complement! ⇒ Object
Method that complements sequence including ambiguity codes.
-
#composition ⇒ Object
Method that returns the residue compositions of a sequence in a hash where the key is the residue and the value is the residue count.
-
#dna? ⇒ Boolean
Method that returns true is a given sequence type is DNA.
-
#each_orf(options = {}) ⇒ Object
Method to find open reading frames (ORFs).
-
#edit_distance(entry) ⇒ Object
Method to determine the Edit Distance between two Sequence objects (case insensitive).
-
#generate(length, type) ⇒ Object
Method that generates a random sequence of a given length and type.
-
#hamming_distance(entry, options = {}) ⇒ Object
Method to determine the Hamming Distance between two Sequence objects (case insensitive).
-
#hard_mask ⇒ Object
Method that returns the percentage of hard masked residues or N’s in a sequence.
-
#indels ⇒ Object
Return the number indels in a sequence.
-
#indels_remove ⇒ Object
Method to remove indels from seq and qual if qual.
-
#initialize(options = {}) ⇒ Seq
constructor
Initialize a sequence object with the following options: - :seq_name Name of the sequence.
-
#length ⇒ Object
(also: #len)
Returns the length of a sequence.
-
#mask_seq_hard!(cutoff) ⇒ Object
Hard masks sequence residues where the corresponding quality scoreis below a given cutoff.
-
#mask_seq_soft!(cutoff) ⇒ Object
Soft masks sequence residues where the corresponding quality score is below a given cutoff.
-
#protein? ⇒ Boolean
Method that returns true is a given sequence type is protein.
-
#qual_base33? ⇒ Boolean
Method that determines if a quality score string can be absolutely identified as base 33.
-
#qual_base64? ⇒ Boolean
Method that determines if a quality score string may be base 64.
-
#qual_coerce!(encoding) ⇒ Object
Method to coerce quality scores to be within the 0-40 range.
-
#qual_convert!(from, to) ⇒ Object
Method to convert quality scores.
-
#qual_valid?(encoding) ⇒ Boolean
Method to determine if a quality score is valid accepting only 0-40 range.
-
#reverse ⇒ Object
Method to reverse the sequence.
-
#reverse! ⇒ Object
Method to reverse the sequence.
-
#rna? ⇒ Boolean
Method that returns true is a given sequence type is RNA.
-
#scores_max ⇒ Object
Method to calculate and return the max quality score.
-
#scores_mean ⇒ Object
Method to calculate and return the mean quality score.
-
#scores_mean_local(window_size) ⇒ Object
Method to run a sliding window of a specified size across a Phred type scores string and calculate for each window the mean score and return the minimum mean score.
-
#scores_min ⇒ Object
Method to calculate and return the min quality score.
-
#shuffle ⇒ Object
Method to return a new Seq object with shuffled sequence.
-
#shuffle! ⇒ Object
Method to shuffle a sequence randomly inline.
-
#soft_mask ⇒ Object
Method that returns the percentage of soft masked residues or lower cased residues in a sequence.
-
#to_bp ⇒ Object
Method that given a Seq entry returns a BioDSL record (a hash).
-
#to_dna ⇒ Object
Method to reverse-transcribe RNA to DNA.
-
#to_fasta(wrap = nil) ⇒ Object
Method that given a Seq entry returns a FASTA entry (a string).
-
#to_fastq ⇒ Object
Method that given a Seq entry returns a FASTQ entry (a string).
-
#to_key ⇒ Object
Method that generates a unique key for a DNA sequence and return this key as a Fixnum.
-
#to_rna ⇒ Object
Method to transcribe DNA to RNA.
-
#type_guess ⇒ Object
Method that guesses and returns the sequence type by inspecting the first 100 residues.
-
#type_guess! ⇒ Object
Method that guesses and sets the sequence type by inspecting the first 100 residues.
Methods included from BackTrack
Methods included from Ambiguity
Methods included from Kmer
Methods included from Trim
#quality_trim, #quality_trim!, #quality_trim_left, #quality_trim_left!, #quality_trim_right, #quality_trim_right!
Methods included from Translate
Methods included from Homopolymer
Methods included from Digest
Constructor Details
#initialize(options = {}) ⇒ Seq
Initialize a sequence object with the following options:
-
:seq_name Name of the sequence.
-
:seq The sequence.
-
:type The sequence type - DNA, RNA, or protein
-
:qual An Illumina type quality scores string.
134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/BioDSL/seq.rb', line 134 def initialize( = {}) @seq_name = [:seq_name] @seq = [:seq] @type = [:type] @qual = [:qual] return unless @seq && @qual return if @seq.length == @qual.length fail SeqError, 'Sequence length and score length mismatch: ' \ "#{@seq.length} != #{@qual.length}" end |
Instance Attribute Details
#qual ⇒ Object
Returns the value of attribute qual.
68 69 70 |
# File 'lib/BioDSL/seq.rb', line 68 def qual @qual end |
#seq ⇒ Object
Returns the value of attribute seq.
68 69 70 |
# File 'lib/BioDSL/seq.rb', line 68 def seq @seq end |
#seq_name ⇒ Object
Returns the value of attribute seq_name.
68 69 70 |
# File 'lib/BioDSL/seq.rb', line 68 def seq_name @seq_name end |
#type ⇒ Object
Returns the value of attribute type.
68 69 70 |
# File 'lib/BioDSL/seq.rb', line 68 def type @type end |
Class Method Details
.check_name_pair(entry1, entry2) ⇒ Object
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/BioDSL/seq.rb', line 109 def self.check_name_pair(entry1, entry2) if entry1.seq_name =~ /^([^ ]+) \d:/ name1 = Regexp.last_match[1] elsif entry1.seq_name =~ %r{^(.+)\/\d$} name1 = Regexp.last_match[1] else fail SeqError, "Could not match sequence name: #{entry1.seq_name}" end if entry2.seq_name =~ /^([^ ]+) \d:/ name2 = Regexp.last_match[1] elsif entry2.seq_name =~ %r{^(.+)\/\d$} name2 = Regexp.last_match[1] else fail SeqError, "Could not match sequence name: #{entry2.seq_name}" end fail SeqError, "Name mismatch: #{name1} != #{name2}" if name1 != name2 end |
.generate_oligos(length, type) ⇒ Object
Class method that generates all possible oligos of a specifed length and type.
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/BioDSL/seq.rb', line 83 def self.generate_oligos(length, type) fail SeqError, "Bad length: #{length}" if length <= 0 case type.downcase when :dna then alph = DNA when :rna then alph = RNA when :protein then alph = PROTEIN else fail SeqError, "Unknown sequence type: #{type}" end oligos = [''] (1..length).each do list = [] oligos.each do |oligo| alph.each { |char| list << oligo + char } end oligos = list end oligos end |
.new_bp(record) ⇒ Object
Class method to instantiate a new Sequence object given a Biopiece record.
72 73 74 75 76 77 78 79 |
# File 'lib/BioDSL/seq.rb', line 72 def self.new_bp(record) seq_name = record[:SEQ_NAME] seq = record[:SEQ] type = record[:SEQ_TYPE].to_sym if record[:SEQ_TYPE] qual = record[:SCORES] new(seq_name: seq_name, seq: seq, type: type, qual: qual) end |
Instance Method Details
#+(other) ⇒ Object
Method to add two Seq objects.
401 402 403 404 405 406 407 |
# File 'lib/BioDSL/seq.rb', line 401 def +(other) new_entry = Seq.new new_entry.seq = @seq + other.seq new_entry.type = @type if @type == other.type new_entry.qual = @qual + other.qual if @qual && other.qual new_entry end |
#<<(entry) ⇒ Object
Method to concatenate sequence entries.
410 411 412 413 414 415 416 417 418 419 |
# File 'lib/BioDSL/seq.rb', line 410 def <<(entry) fail SeqError, 'sequences of different types' unless @type == entry.type fail SeqError, 'qual is missing in one entry' unless @qual.class == entry.qual.class @seq << entry.seq @qual << entry.qual unless entry.qual.nil? self end |
#[](*args) ⇒ Object
Index method for Seq objects.
422 423 424 425 426 427 428 429 430 |
# File 'lib/BioDSL/seq.rb', line 422 def [](*args) entry = Seq.new entry.seq_name = @seq_name.dup unless @seq_name.nil? entry.seq = @seq[*args] || '' entry.type = @type entry.qual = @qual[*args] || '' unless @qual.nil? entry end |
#[]=(*args, entry) ⇒ Object
Index assignment method for Seq objects.
433 434 435 436 437 438 |
# File 'lib/BioDSL/seq.rb', line 433 def []=(*args, entry) @seq[*args] = entry.seq[*args] @qual[*args] = entry.qual[*args] unless @qual.nil? self end |
#complement ⇒ Object
Method that complements sequence including ambiguity codes.
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
# File 'lib/BioDSL/seq.rb', line 314 def complement fail SeqError, 'Cannot complement 0 length sequence' if length == 0 entry = Seq.new(seq_name: @seq_name, type: @type, qual: @qual) if dna? entry.seq = @seq.tr('AGCUTRYWSMKHDVBNagcutrywsmkhdvbn', 'TCGAAYRWSKMDHBVNtcgaayrwskmdhbvn') elsif rna? entry.seq = @seq.tr('AGCUTRYWSMKHDVBNagcutrywsmkhdvbn', 'UCGAAYRWSKMDHBVNucgaayrwskmdhbvn') else fail SeqError, "Cannot complement sequence type: #{@type}" end entry end |
#complement! ⇒ Object
Method that complements sequence including ambiguity codes.
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/BioDSL/seq.rb', line 333 def complement! fail SeqError, 'Cannot complement 0 length sequence' if length == 0 if dna? @seq.tr!('AGCUTRYWSMKHDVBNagcutrywsmkhdvbn', 'TCGAAYRWSKMDHBVNtcgaayrwskmdhbvn') elsif rna? @seq.tr!('AGCUTRYWSMKHDVBNagcutrywsmkhdvbn', 'UCGAAYRWSKMDHBVNucgaayrwskmdhbvn') else fail SeqError, "Cannot complement sequence type: #{@type}" end self end |
#composition ⇒ Object
Method that returns the residue compositions of a sequence in a hash where the key is the residue and the value is the residue count.
443 444 445 446 447 448 449 450 451 |
# File 'lib/BioDSL/seq.rb', line 443 def composition comp = Hash.new(0); @seq.upcase.each_char do |char| comp[char] += 1 end comp end |
#dna? ⇒ Boolean
Method that returns true is a given sequence type is DNA.
202 203 204 |
# File 'lib/BioDSL/seq.rb', line 202 def dna? @type == :dna end |
#each_orf(options = {}) ⇒ Object
Method to find open reading frames (ORFs).
608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 |
# File 'lib/BioDSL/seq.rb', line 608 def each_orf( = {}) size_min = [:size_min] || 0 size_max = [:size_max] || length start_codons = [:start_codons] || 'ATG,GTG,AUG,GUG' stop_codons = [:stop_codons] || 'TAA,TGA,TAG,UAA,UGA,UAG' pick_longest = [:pick_longest] orfs = [] pos_beg = 0 regex_start = Regexp.new(start_codons.split(',').join('|'), true) regex_stop = Regexp.new(stop_codons.split(',').join('|'), true) while pos_beg && pos_beg < length - size_min pos_beg = @seq.index(regex_start, pos_beg) next unless pos_beg pos_end = @seq.index(regex_stop, pos_beg) next unless pos_end orf_length = (pos_end - pos_beg) + 3 if (orf_length % 3) == 0 if size_min <= orf_length && orf_length <= size_max subseq = self[pos_beg...pos_beg + orf_length] orfs << Orf.new(subseq, pos_beg, pos_end + 2) end end pos_beg += 1 end if pick_longest orf_hash = {} orfs.each { |orf| orf_hash[orf.stop] = orf unless orf_hash[orf.stop] } orfs = orf_hash.values end if block_given? orfs.each { |orf| yield orf } else return orfs end end |
#edit_distance(entry) ⇒ Object
Method to determine the Edit Distance between two Sequence objects (case insensitive).
361 362 363 |
# File 'lib/BioDSL/seq.rb', line 361 def edit_distance(entry) Levenshtein.distance(@seq, entry.seq) end |
#generate(length, type) ⇒ Object
Method that generates a random sequence of a given length and type.
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 |
# File 'lib/BioDSL/seq.rb', line 366 def generate(length, type) fail SeqError, "Cannot generate seq length < 1: #{length}" if length <= 0 case type when :dna then alph = DNA when :rna then alph = RNA when :protein then alph = PROTEIN else fail SeqError, "Unknown sequence type: #{type}" end seq_new = Array.new(length) { alph[rand(alph.size)] }.join('') @seq = seq_new @type = type seq_new end |
#hamming_distance(entry, options = {}) ⇒ Object
Method to determine the Hamming Distance between two Sequence objects (case insensitive).
351 352 353 354 355 356 357 |
# File 'lib/BioDSL/seq.rb', line 351 def hamming_distance(entry, = {}) if [:ambiguity] BioDSL::Hamming.distance(@seq, entry.seq, ) else BioDSL::Hamming.distance(@seq.upcase, entry.seq.upcase, ) end end |
#hard_mask ⇒ Object
Method that returns the percentage of hard masked residues or N’s in a sequence.
455 456 457 458 |
# File 'lib/BioDSL/seq.rb', line 455 def hard_mask ((@seq.upcase.scan('N').size.to_f / (length - indels).to_f) * 100). round(2) end |
#indels ⇒ Object
Return the number indels in a sequence.
174 175 176 177 |
# File 'lib/BioDSL/seq.rb', line 174 def indels regex = Regexp.new(/[#{Regexp.escape(INDELS.join(""))}]/) @seq.scan(regex).size end |
#indels_remove ⇒ Object
Method to remove indels from seq and qual if qual.
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/BioDSL/seq.rb', line 180 def indels_remove if @qual.nil? @seq.delete!(Regexp.escape(INDELS.join(''))) else na_seq = NArray.to_na(@seq, 'byte') na_qual = NArray.to_na(@qual, 'byte') mask = NArray.byte(length) INDELS.each do |c| mask += na_seq.eq(c.ord) end mask = mask.eq(0) @seq = na_seq[mask].to_s @qual = na_qual[mask].to_s end self end |
#length ⇒ Object Also known as: len
Returns the length of a sequence.
167 168 169 |
# File 'lib/BioDSL/seq.rb', line 167 def length @seq.nil? ? 0 : @seq.length end |
#mask_seq_hard!(cutoff) ⇒ Object
Hard masks sequence residues where the corresponding quality scoreis below a given cutoff.
468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 |
# File 'lib/BioDSL/seq.rb', line 468 def mask_seq_hard!(cutoff) fail SeqError, 'seq is nil' if @seq.nil? fail SeqError, 'qual is nil' if @qual.nil? fail SeqError, "cufoff value: #{cutoff} out of range: " \ "#{SCORE_MIN}..#{SCORE_MAX}" unless (SCORE_MIN..SCORE_MAX). include? cutoff na_seq = NArray.to_na(@seq.upcase, 'byte') na_qual = NArray.to_na(@qual, 'byte') mask = (na_qual - SCORE_BASE) < cutoff mask *= na_seq.ne('-'.ord) na_seq[mask] = 'N'.ord @seq = na_seq.to_s self end |
#mask_seq_soft!(cutoff) ⇒ Object
Soft masks sequence residues where the corresponding quality score is below a given cutoff. Masked sequence will be lowercased and remaining will be uppercased.
490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 |
# File 'lib/BioDSL/seq.rb', line 490 def mask_seq_soft!(cutoff) fail SeqError, 'seq is nil' if @seq.nil? fail SeqError, 'qual is nil' if @qual.nil? fail SeqError, "cufoff value: #{cutoff} out of range: " \ "#{SCORE_MIN} .. #{SCORE_MAX}" unless (SCORE_MIN..SCORE_MAX). include? cutoff na_seq = NArray.to_na(@seq.upcase, 'byte') na_qual = NArray.to_na(@qual, 'byte') mask = (na_qual - SCORE_BASE) < cutoff mask *= na_seq.ne('-'.ord) na_seq[mask] ^= ' '.ord @seq = na_seq.to_s self end |
#protein? ⇒ Boolean
Method that returns true is a given sequence type is protein.
212 213 214 |
# File 'lib/BioDSL/seq.rb', line 212 def protein? @type == :protein end |
#qual_base33? ⇒ Boolean
Method that determines if a quality score string can be absolutely identified as base 33.
511 512 513 |
# File 'lib/BioDSL/seq.rb', line 511 def qual_base33? @qual.match(/[!-:]/) ? true : false end |
#qual_base64? ⇒ Boolean
Method that determines if a quality score string may be base 64.
516 517 518 |
# File 'lib/BioDSL/seq.rb', line 516 def qual_base64? @qual.match(/[K-h]/) ? true : false end |
#qual_coerce!(encoding) ⇒ Object
Method to coerce quality scores to be within the 0-40 range.
534 535 536 537 538 539 540 541 542 543 544 545 |
# File 'lib/BioDSL/seq.rb', line 534 def qual_coerce!(encoding) fail SeqError, 'Missing qual' if @qual.nil? case encoding when :base_33 then qual_coerce_C(@qual, @qual.length, 33, 73) # !-J when :base_64 then qual_coerce_C(@qual, @qual.length, 64, 104) # @-h else fail SeqError, "unknown quality score encoding: #{encoding}" end self end |
#qual_convert!(from, to) ⇒ Object
Method to convert quality scores.
548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 |
# File 'lib/BioDSL/seq.rb', line 548 def qual_convert!(from, to) unless from == :base_33 || from == :base_64 fail SeqError, "unknown quality score encoding: #{from}" end unless to == :base_33 || to == :base_64 fail SeqError, "unknown quality score encoding: #{to}" end if from == :base_33 && to == :base_64 qual_convert_C(@qual, @qual.length, 31) # += 64 - 33 elsif from == :base_64 && to == :base_33 # Handle negative Solexa values from -5 to -1 (set these to 0). qual_coerce_C(@qual, @qual.length, 64, 104) qual_convert_C(@qual, @qual.length, -31) # -= 64 - 33 end self end |
#qual_valid?(encoding) ⇒ Boolean
Method to determine if a quality score is valid accepting only 0-40 range.
521 522 523 524 525 526 527 528 529 530 531 |
# File 'lib/BioDSL/seq.rb', line 521 def qual_valid?(encoding) fail SeqError, 'Missing qual' if @qual.nil? case encoding when :base_33 then return true if @qual.match(/^[!-I]*$/) when :base_64 then return true if @qual.match(/^[@-h]*$/) else fail SeqError, "unknown quality score encoding: #{encoding}" end false end |
#reverse ⇒ Object
Method to reverse the sequence.
295 296 297 298 299 300 301 302 303 304 |
# File 'lib/BioDSL/seq.rb', line 295 def reverse entry = Seq.new( seq_name: @seq_name, seq: @seq.reverse, type: @type, qual: (@qual ? @qual.reverse : @qual) ) entry end |
#reverse! ⇒ Object
Method to reverse the sequence.
307 308 309 310 311 |
# File 'lib/BioDSL/seq.rb', line 307 def reverse! @seq.reverse! @qual.reverse! if @qual self end |
#rna? ⇒ Boolean
Method that returns true is a given sequence type is RNA.
207 208 209 |
# File 'lib/BioDSL/seq.rb', line 207 def rna? @type == :rna end |
#scores_max ⇒ Object
Method to calculate and return the max quality score.
589 590 591 592 593 594 595 596 |
# File 'lib/BioDSL/seq.rb', line 589 def scores_max fail SeqError, 'Missing qual in entry' if @qual.nil? na_qual = NArray.to_na(@qual, 'byte') na_qual -= SCORE_BASE na_qual.max end |
#scores_mean ⇒ Object
Method to calculate and return the mean quality score.
569 570 571 572 573 574 575 576 |
# File 'lib/BioDSL/seq.rb', line 569 def scores_mean fail SeqError, 'Missing qual in entry' if @qual.nil? na_qual = NArray.to_na(@qual, 'byte') na_qual -= SCORE_BASE na_qual.mean end |
#scores_mean_local(window_size) ⇒ Object
Method to run a sliding window of a specified size across a Phred type scores string and calculate for each window the mean score and return the minimum mean score.
601 602 603 604 605 |
# File 'lib/BioDSL/seq.rb', line 601 def scores_mean_local(window_size) fail SeqError, 'Missing qual in entry' if @qual.nil? scores_mean_local_C(@qual, @qual.length, SCORE_BASE, window_size) end |
#scores_min ⇒ Object
Method to calculate and return the min quality score.
579 580 581 582 583 584 585 586 |
# File 'lib/BioDSL/seq.rb', line 579 def scores_min fail SeqError, 'Missing qual in entry' if @qual.nil? na_qual = NArray.to_na(@qual, 'byte') na_qual -= SCORE_BASE na_qual.min end |
#shuffle ⇒ Object
Method to return a new Seq object with shuffled sequence.
385 386 387 388 389 390 391 392 |
# File 'lib/BioDSL/seq.rb', line 385 def shuffle Seq.new( seq_name: @seq_name, seq: @seq.split('').shuffle!.join, type: @type, qual: @qual ) end |
#shuffle! ⇒ Object
Method to shuffle a sequence randomly inline.
395 396 397 398 |
# File 'lib/BioDSL/seq.rb', line 395 def shuffle! @seq = @seq.split('').shuffle!.join self end |
#soft_mask ⇒ Object
Method that returns the percentage of soft masked residues or lower cased residues in a sequence.
462 463 464 |
# File 'lib/BioDSL/seq.rb', line 462 def soft_mask ((@seq.scan(/[a-z]/).size.to_f / (length - indels).to_f) * 100).round(2) end |
#to_bp ⇒ Object
Method that given a Seq entry returns a BioDSL record (a hash).
233 234 235 236 237 238 239 240 |
# File 'lib/BioDSL/seq.rb', line 233 def to_bp record = {} record[:SEQ_NAME] = @seq_name if @seq_name record[:SEQ] = @seq if @seq record[:SEQ_LEN] = length if @seq record[:SCORES] = @qual if @qual record end |
#to_dna ⇒ Object
Method to reverse-transcribe RNA to DNA.
225 226 227 228 229 230 |
# File 'lib/BioDSL/seq.rb', line 225 def to_dna fail SeqError, 'Cant reverse-transcribe 0 length sequence' if length == 0 fail SeqError, "Cant reverse-transcribe seq type: #{@type}" unless rna? @type = :dna @seq.tr!('Uu', 'Tt') end |
#to_fasta(wrap = nil) ⇒ Object
Method that given a Seq entry returns a FASTA entry (a string).
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
# File 'lib/BioDSL/seq.rb', line 243 def to_fasta(wrap = nil) fail SeqError, 'Missing seq_name' if @seq_name.nil? || @seq_name == '' fail SeqError, 'Missing seq' if @seq.nil? || @seq.empty? seq_name = @seq_name.to_s seq = @seq.to_s unless wrap.nil? seq.gsub!(/(.{#{wrap}})/) do |match| match << $INPUT_RECORD_SEPARATOR end seq.chomp! end ">#{seq_name}#{$INPUT_RECORD_SEPARATOR}#{seq}#{$INPUT_RECORD_SEPARATOR}" end |
#to_fastq ⇒ Object
Method that given a Seq entry returns a FASTQ entry (a string).
262 263 264 265 266 267 268 269 270 271 272 |
# File 'lib/BioDSL/seq.rb', line 262 def to_fastq fail SeqError, 'Missing seq_name' if @seq_name.nil? fail SeqError, 'Missing seq' if @seq.nil? fail SeqError, 'Missing qual' if @qual.nil? seq_name = @seq_name.to_s seq = @seq.to_s qual = @qual.to_s "@#{seq_name}#{$RS}#{seq}#{$RS}+#{$RS}#{qual}#{$RS}" end |
#to_key ⇒ Object
Method that generates a unique key for a DNA sequence and return this key as a Fixnum.
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
# File 'lib/BioDSL/seq.rb', line 276 def to_key key = 0 @seq.upcase.each_char do |char| key <<= 2 case char when 'A' then key |= 0 when 'C' then key |= 1 when 'G' then key |= 2 when 'T' then key |= 3 else fail SeqError, "Bad residue: #{char}" end end key end |
#to_rna ⇒ Object
Method to transcribe DNA to RNA.
217 218 219 220 221 222 |
# File 'lib/BioDSL/seq.rb', line 217 def to_rna fail SeqError, 'Cannot transcribe 0 length sequence' if length == 0 fail SeqError, 'Cannot transcribe sequence type: #{@type}' unless dna? @type = :rna @seq.tr!('Tt', 'Uu') end |
#type_guess ⇒ Object
Method that guesses and returns the sequence type by inspecting the first 100 residues.
149 150 151 152 153 154 155 156 157 |
# File 'lib/BioDSL/seq.rb', line 149 def type_guess fail SeqError, 'Guess failed: sequence is nil' if @seq.nil? case @seq[0...100].downcase when /[flpqie]/ then return :protein when /[u]/ then return :rna else return :dna end end |
#type_guess! ⇒ Object
Method that guesses and sets the sequence type by inspecting the first 100 residues.
161 162 163 164 |
# File 'lib/BioDSL/seq.rb', line 161 def type_guess! @type = type_guess self end |