Class: Bio::GFF::GFF3

Inherits:
Bio::GFF show all
Includes:
Escape
Defined in:
lib/bio/db/gff.rb

Overview

DESCRIPTION

Represents version 3 of GFF specification. For more information on version GFF3, see song.sourceforge.net/gff3.shtml – obsolete URL: flybase.bio.indiana.edu/annot/gff3.html ++

Defined Under Namespace

Modules: Escape Classes: Record, RecordBoundary, SequenceRegion

Constant Summary collapse

VERSION =
3
MetaData =

stores GFF3 MetaData

GFF2::MetaData

Constants included from Escape

Escape::UNSAFE, Escape::UNSAFE_ATTRIBUTE, Escape::UNSAFE_SEQID

Instance Attribute Summary collapse

Attributes inherited from Bio::GFF

#records

Instance Method Summary collapse

Constructor Details

#initialize(str = nil) ⇒ GFF3

Creates a Bio::GFF::GFF3 object by building a collection of Bio::GFF::GFF3::Record (and metadata) objects.


Arguments:

  • str: string in GFF format

Returns

Bio::GFF object



875
876
877
878
879
880
881
882
883
# File 'lib/bio/db/gff.rb', line 875

def initialize(str = nil)
  @gff_version = nil
  @records = []
  @sequence_regions = []
  @metadata = []
  @sequences = []
  @in_fasta = false
  parse(str) if str
end

Instance Attribute Details

#gff_versionObject (readonly)

GFF3 version string (String or nil). nil means “3”.



886
887
888
# File 'lib/bio/db/gff.rb', line 886

def gff_version
  @gff_version
end

#metadataObject

Metadata (except “##sequence-region”, “##gff-version”, “###”). Must be an array of Bio::GFF::GFF3::MetaData objects.



894
895
896
# File 'lib/bio/db/gff.rb', line 894

def 
  @metadata
end

#sequence_regionsObject

Metadata of “##sequence-region”. Must be an array of Bio::GFF::GFF3::SequenceRegion objects.



890
891
892
# File 'lib/bio/db/gff.rb', line 890

def sequence_regions
  @sequence_regions
end

#sequencesObject

Sequences bundled within GFF3. Must be an array of Bio::Sequence objects.



898
899
900
# File 'lib/bio/db/gff.rb', line 898

def sequences
  @sequences
end

Instance Method Details

#parse(str) ⇒ Object

Parses a GFF3 entries, and concatenated the parsed data.

Note that after “##FASTA” line is given, only fasta-formatted text is accepted.


Arguments:

  • str: string in GFF format

Returns

self



909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
# File 'lib/bio/db/gff.rb', line 909

def parse(str)
  # if already after the ##FASTA line, parses fasta format and return
  if @in_fasta then
    parse_fasta(str)
    return self
  end

  if str.respond_to?(:gets) then
    # str is a IO-like object
    fst = nil
  else
    # str is a String
    gff, sep, fst = str.split(/^(\>|##FASTA.*)/n, 2)
    fst = sep + fst if sep == '>' and fst
    str = gff
  end

  # parses GFF lines
  str.each_line do |line|
    if /^\#\#([^\s]+)/ =~ line then
      ($1, line)
      parse_fasta(str) if @in_fasta
    elsif /^\>/ =~ line then
      @in_fasta = true
      parse_fasta(str, line)
    else
      @records << GFF3::Record.new(line)
    end
  end

  # parses fasta format when str is a String and fasta data exists
  if fst then
    @in_fasta = true
    parse_fasta(fst)
  end

  self
end

#to_sObject

string representation of whole entry.



964
965
966
967
968
969
970
971
972
973
974
975
976
977
# File 'lib/bio/db/gff.rb', line 964

def to_s
  ver = @gff_version || VERSION.to_s
  if @sequences.size > 0 then
    seqs = "##FASTA\n" +
      @sequences.collect { |s| s.to_fasta(s.entry_id, 70) }.join('')
  else
    seqs = ''
  end

  ([ "##gff-version #{escape(ver)}\n" ] +
   @metadata.collect { |m| m.to_s } +
   @sequence_regions.collect { |m| m.to_s } +
   @records.collect{ |r| r.to_s }).join('') + seqs
end