Class: Bio::GFF::GFF3

Inherits:
Bio::GFF show all
Includes:
Escape
Defined in:
lib/bio/db/gff.rb

Overview

DESCRIPTION

Represents version 3 of GFF specification. For more information on version GFF3, see song.sourceforge.net/gff3.shtml – obsolete URL: flybase.bio.indiana.edu/annot/gff3.html ++

Defined Under Namespace

Modules: Escape Classes: Record, RecordBoundary, SequenceRegion

Constant Summary collapse

VERSION =
3
MetaData =

stores GFF3 MetaData

GFF2::MetaData

Constants included from Escape

Escape::UNSAFE, Escape::UNSAFE_ATTRIBUTE, Escape::UNSAFE_SEQID

Instance Attribute Summary collapse

Attributes inherited from Bio::GFF

#records

Instance Method Summary collapse

Constructor Details

#initialize(str = nil) ⇒ GFF3

Creates a Bio::GFF::GFF3 object by building a collection of Bio::GFF::GFF3::Record (and metadata) objects.


Arguments:

  • str: string in GFF format

Returns

Bio::GFF object



876
877
878
879
880
881
882
883
884
# File 'lib/bio/db/gff.rb', line 876

def initialize(str = nil)
  @gff_version = nil
  @records = []
  @sequence_regions = []
  @metadata = []
  @sequences = []
  @in_fasta = false
  parse(str) if str
end

Instance Attribute Details

#gff_versionObject (readonly)

GFF3 version string (String or nil). nil means “3”.



887
888
889
# File 'lib/bio/db/gff.rb', line 887

def gff_version
  @gff_version
end

#metadataObject

Metadata (except “##sequence-region”, “##gff-version”, “###”). Must be an array of Bio::GFF::GFF3::MetaData objects.



895
896
897
# File 'lib/bio/db/gff.rb', line 895

def 
  @metadata
end

#sequence_regionsObject

Metadata of “##sequence-region”. Must be an array of Bio::GFF::GFF3::SequenceRegion objects.



891
892
893
# File 'lib/bio/db/gff.rb', line 891

def sequence_regions
  @sequence_regions
end

#sequencesObject

Sequences bundled within GFF3. Must be an array of Bio::Sequence objects.



899
900
901
# File 'lib/bio/db/gff.rb', line 899

def sequences
  @sequences
end

Instance Method Details

#parse(str) ⇒ Object

Parses a GFF3 entries, and concatenated the parsed data.

Note that after “##FASTA” line is given, only fasta-formatted text is accepted.


Arguments:

  • str: string in GFF format

Returns

self



910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
# File 'lib/bio/db/gff.rb', line 910

def parse(str)
  # if already after the ##FASTA line, parses fasta format and return
  if @in_fasta then
    parse_fasta(str)
    return self
  end

  if str.respond_to?(:gets) then
    # str is a IO-like object
    fst = nil
  else
    # str is a String
    gff, sep, fst = str.split(/^(\>|##FASTA.*)/n, 2)
    fst = sep + fst if sep == '>' and fst
    str = gff
  end

  # parses GFF lines
  str.each_line do |line|
    if /^\#\#([^\s]+)/ =~ line then
      ($1, line)
      parse_fasta(str) if @in_fasta
    elsif /^\>/ =~ line then
      @in_fasta = true
      parse_fasta(str, line)
    else
      @records << GFF3::Record.new(line)
    end
  end

  # parses fasta format when str is a String and fasta data exists
  if fst then
    @in_fasta = true
    parse_fasta(fst)
  end

  self
end

#to_sObject

string representation of whole entry.



965
966
967
968
969
970
971
972
973
974
975
976
977
978
# File 'lib/bio/db/gff.rb', line 965

def to_s
  ver = @gff_version || VERSION.to_s
  if @sequences.size > 0 then
    seqs = "##FASTA\n" +
      @sequences.collect { |s| s.to_fasta(s.entry_id, 70) }.join('')
  else
    seqs = ''
  end

  ([ "##gff-version #{escape(ver)}\n" ] +
   @metadata.collect { |m| m.to_s } +
   @sequence_regions.collect { |m| m.to_s } +
   @records.collect{ |r| r.to_s }).join('') + seqs
end