bio-gff3

GFF3 plugin for BioRuby, aimed at parsing big data

Features:

# Take GFF (genome browser) information and digest mRNA and CDS sequences # Options for low memory use and caching of records # Support for external FASTA files

You can use this plugin in two ways. First as a standalone program, next as a plugin library to BioRuby.

For example, fetch mRNA and CDS information from GFF3 files and output to FASTA:

./bin/gff3-fetch mrna test/data/gff/test.gff3
./bin/gff3-fetch cds test/data/gff/test.gff3

Or clone this repository and add the ‘lib’ dir to the Ruby search path and

require 'bio/db/gff/gffdb'

You can also run RSpec with something like

rspec -I ../bioruby/lib/ spec/*.rb

This implementation depends on BioRuby’s basic GFF3 parser, with the possible advantage that the plugin is faster and does not consume all memory. The Gff3 specs are based on the output of the Wormbase genome browser.

For a write-up see thebird.nl/bioruby/BioRuby_GFF3.html


Fetch and assemble mRNAs, or CDS and print in FASTA format. 

  gff3-fetch [--no-cache] mRNA|CDS [filename.fa] filename.gff

Where:

  --no-cache      : do not load everything in memory (slower)
  mRNA            : assemble mRNA
  CDS             : assemble CDS 

Multiple GFF3 files can be used. For external FASTA files, always the last
one before the GFF file is used.

Examples:

  Find mRNA and CDS information from test.gff3 (which includes sequence information)

    gff3-fetch mRNA test/data/gff/test.gff3
    gff3-fetch CDS test/data/gff/test.gff3

  Find CDS from external FASTA file

    gff3-fetch CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3

  Find mRNA from external FASTA file, without loading everything in RAM

    gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3   

If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475

Copyright © 2010,2011 Pjotr Prins <[email protected]>