Class: Bio::GFF::GFF3::Record::Gap

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/db/gff.rb

Overview

Bio:GFF::GFF3::Record::Gap is a class to store data of “Gap” attribute.

Defined Under Namespace

Classes: Code

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str = nil) ⇒ Gap

Creates a new Gap object.


Arguments:

  • str: a formatted string, or nil.



1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
# File 'lib/bio/db/gff.rb', line 1275

def initialize(str = nil)
  if str then
    @data = str.split(/ +/).collect do |x|
      if /\A([A-Z])([0-9]+)\z/ =~ x.strip then
        Code.new($1.intern, $2.to_i)
      else
        warn "ignored unknown token: #{x}.inspect" if $VERBOSE
        nil
      end
    end
    @data.compact!
  else
    @data = []
  end
end

Class Method Details

.new_from_sequences_na(reference, target, gap_regexp = /[^a-zA-Z]/) ⇒ Object

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.


Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (nucleotide sequence)

  • gap_regexp: regexp to identify gap



1391
1392
1393
1394
1395
1396
1397
1398
1399
# File 'lib/bio/db/gff.rb', line 1391

def self.new_from_sequences_na(reference, target,
                               gap_regexp = /[^a-zA-Z]/)
  gap = self.new
  gap.instance_eval { 
    __initialize_from_sequences_na(reference, target,
                                   gap_regexp)
  }
  gap
end

.new_from_sequences_na_aa(reference, target, gap_regexp = /[^a-zA-Z]/, space_regexp = /\s/, forward_frameshift_regexp = /\>/, reverse_frameshift_regexp = /\</) ⇒ Object

Creates a new Gap object from given sequence alignment.

Note that sites of which both reference and target are gaps are silently removed.

For incorrect alignments that break 3:1 rule, gap positions will be moved inside codons, unwanted gaps will be removed, and some forward or reverse frameshift will be inserted.

For example,

atgg-taagac-att
M  V  K  -  I

is treated as:

atggt<aagacatt
M  V  K  >>I

Incorrect combination of frameshift with frameshift or gap may cause undefined behavior.

Forward frameshifts are recomended to be indicated in the target sequence. Reverse frameshifts can be indicated in the reference sequence or the target sequence.

Priority of regular expressions:

space > forward/reverse frameshift > gap

Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (amino acid sequence)

  • gap_regexp: regexp to identify gap

  • space_regexp: regexp to identify space character which is completely ignored

  • forward_frameshift_regexp: regexp to identify forward frameshift

  • reverse_frameshift_regexp: regexp to identify reverse frameshift



1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
# File 'lib/bio/db/gff.rb', line 1587

def self.new_from_sequences_na_aa(reference, target,
                                  gap_regexp = /[^a-zA-Z]/,
                                  space_regexp = /\s/,
                                  forward_frameshift_regexp = /\>/,
                                  reverse_frameshift_regexp = /\</)
  gap = self.new
  gap.instance_eval { 
    __initialize_from_sequences_na_aa(reference, target,
                                      gap_regexp,
                                      space_regexp,
                                      forward_frameshift_regexp,
                                      reverse_frameshift_regexp)
  }
  gap
end

.parse(str) ⇒ Object

Same as new(str).



1292
1293
1294
# File 'lib/bio/db/gff.rb', line 1292

def self.parse(str)
  self.new(str)
end

Instance Method Details

#==(other) ⇒ Object

If self == other, returns true. otherwise, returns false.



1615
1616
1617
1618
1619
1620
1621
1622
# File 'lib/bio/db/gff.rb', line 1615

def ==(other)
  if other.class == self.class and
      @data == other.data then
    true
  else
    false
  end
end

#process_sequences_na(reference, target, gap_char = '-') ⇒ Object

Processes nucleotide sequences and returns gapped sequences as an array of sequences.

Note for forward/reverse frameshift: Forward/Reverse_frameshift is simply treated as gap insertion to the target/reference sequence.


Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (nucleotide sequence)

  • gap_char: gap character



1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
# File 'lib/bio/db/gff.rb', line 1715

def process_sequences_na(reference, target, gap_char = '-')
  s_ref, s_tgt = dup_seqs(reference, target)

  s_ref, s_tgt = __process_sequences(s_ref, s_tgt,
                                     gap_char, gap_char,
                                     1, 1,
                                     gap_char, gap_char)

  if $VERBOSE and s_ref.length != s_tgt.length then
    warn "returned sequences not equal length"
  end
  return s_ref, s_tgt
end

#process_sequences_na_aa(reference, target, gap_char = '-', space_char = ' ', forward_frameshift = '>', reverse_frameshift = '<') ⇒ Object

Processes sequences and returns gapped sequences as an array of sequences. reference must be a nucleotide sequence, and target must be an amino acid sequence.

Note for reverse frameshift: Reverse_frameshift characers are inserted in the reference sequence. For example, alignment of “Gap=M3 R1 M2” is:

atgaagat<aatgtc
M  K  I  N  V

Alignment of “Gap=M3 R3 M3” is:

atgaag<<<attaatgtc
M  K  I  I  N  V

Arguments:

  • reference: reference sequence (nucleotide sequence)

  • target: target sequence (amino acid sequence)

  • gap_char: gap character

  • space_char: space character inserted to amino sequence for matching na-aa alignment

  • forward_frameshift: forward frameshift character

  • reverse_frameshift: reverse frameshift character



1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
# File 'lib/bio/db/gff.rb', line 1752

def process_sequences_na_aa(reference, target,
                            gap_char = '-',
                            space_char = ' ',
                            forward_frameshift = '>',
                            reverse_frameshift = '<')
  s_ref, s_tgt = dup_seqs(reference, target)
  s_tgt = s_tgt.gsub(/./, "\\0#{space_char}#{space_char}")
  ref_increment = 3
  tgt_increment = 1 + space_char.length * 2
  ref_gap = gap_char * 3
  tgt_gap = "#{gap_char}#{space_char}#{space_char}"
  return __process_sequences(s_ref, s_tgt,
                             ref_gap, tgt_gap,
                             ref_increment, tgt_increment,
                             forward_frameshift,
                             reverse_frameshift)
end

#to_sObject

string representation



1604
1605
1606
# File 'lib/bio/db/gff.rb', line 1604

def to_s
  @data.collect { |x| x.to_s }.join(" ")
end