Library and tools for using a triple-store with biological data. It includes tools for storing parsed data into a triple store. The name includes RDF, the XML representation of triples, but that really is too a narrow view of the purpose of this biogem. The alternative names (bio-semweb and bio-triplestore) looked even worse.
Every data-type has a Parser module. This parser module controls the parsing flow. The actual parsing is handled by lower level routines, which may even reside in other libraries, such as BioRuby. The basic flow is
input -> parse -> output
The input can be anything, from directories, files to web based resources.
The output of the parser should be in some form of triple format, though simple tab delimited tables can also be supported (depending on the parser).
The first functionality includes parsing the results of gene set enrichment analysis (GSEA) into triples (more below).
This project is linked with next generation sequencing, genome browsing, visualisation and QTL mapping. E.g.
Note: this software is under active development! See also the design doc.
Gene set enrichment analysis (GSEA)
GSEA is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. The GSEA tool produces two result files for every two biological states. We wrote a parser for the summary files, which outputs either a single table of results (based on a cut-off value). This table can be converted into a triple-store.
To create a tab delimited file from a GSEA result, where FDR < 0.25
bio-rdf gsea --tabulate --exec "rec.fdr <= 0.25" ./gsea/output/ > results.txt
gem install bio-rdf
The API doc is online. For more code examples see the test files in the source tree.
Project home page
Information on the source tree, documentation, examples, issues and how to contribute, see
If you use this software, please cite one of
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
This Biogem is published at #bio-rdf
Copyright (c) 2012 Pjotr Prins. See LICENSE.txt for further details.