Module: Ensembl

Defined in:
lib/ensembl.rb,
lib/ensembl/core/slice.rb,
lib/ensembl/core/project.rb,
lib/ensembl/db_connection.rb,
lib/ensembl/core/transform.rb,
lib/ensembl/core/collection.rb,
lib/ensembl/core/transcript.rb,
lib/ensembl/core/activerecord.rb,
lib/ensembl/variation/variation.rb,
lib/ensembl/variation/activerecord.rb

Overview

What is it?

The Ensembl module provides an API to the Ensembl databases stored at ensembldb.ensembl.org. This is the same information that is available from www.ensembl.org.

The Ensembl::Core module mainly covers sequences and annotations. The Ensembl::Variation module covers variations (e.g. SNPs). The Ensembl::Compara module covers comparative mappings between species.

ActiveRecord

The Ensembl API provides a ruby interface to the Ensembl mysql databases at ensembldb.ensembl.org. Most of the API is based on ActiveRecord to get data from that database. In general, each table is described by a class with the same name: the coord_system table is covered by the CoordSystem class, the seq_region table is covered by the SeqRegion class, etc. As a result, accessors are available for all columns in each table. For example, the seq_region table has the following columns: seq_region_id, name, coord_system_id and length. Through ActiveRecord, these column names become available as attributes of SeqRegion objects:

puts my_seq_region.seq_region_id
puts my_seq_region.name
puts my_seq_region.coord_system_id
puts my_seq_region.length.to_s

ActiveRecord makes it easy to extract data from those tables using the collection of #find methods. There are three types of #find methods (e.g. for the CoordSystem class):

  1. find based on primary key in table:

my_coord_system = CoordSystem.find(5)
  1. find_by_sql:

my_coord_system = CoordSystem.find_by_sql('SELECT * FROM coord_system WHERE name = 'chromosome'")
  1. find_by_<insert_your_column_name_here>

my_coord_system1 = CoordSystem.find_by_name('chromosome')
my_coord_system2 = CoordSystem.find_by_rank(3)

To find out which find_by_<column> methods are available, you can list the column names using the column_names class methods:

puts Ensembl::Core::CoordSystem.column_names.join("\t")

For more information on the find methods, see ar.rubyonrails.org/classes/ActiveRecord/Base.html#M000344

The relationships between different tables are accessible through the classes as well. For example, to loop over all seq_regions belonging to a coord_system (a coord_system “has many” seq_regions):

chr_coord_system = CoordSystem.find_by_name('chromosome')
chr_coord_system.seq_regions.each do |seq_region|
  puts seq_region.name
end

Of course, you can go the other way as well (a seq_region “belongs to” a coord_system):

chr4 = SeqRegion.find_by_name('4')
puts chr4.coord_system.name  #--> 'chromosome'

To find out what relationships exist for a given class, you can use the #reflect_on_all_associations class methods:

puts SeqRegion.reflect_on_all_associations(:has_many).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:has_one).collect{|a| a.name.to_s}.join("\n")
puts SeqRegion.reflect_on_all_associations(:belongs_to).collect{|a| a.name.to_s}.join("\n")

Defined Under Namespace

Modules: Core, DBRegistry, Variation Classes: DummyDBConnection, Session

Constant Summary collapse

ENSEMBL_RELEASE =
60
SESSION =
Ensembl::Session.new
DB_ADAPTER =
'mysql'
DB_HOST =
'ensembldb.ensembl.org'
DB_USERNAME =
'anonymous'
DB_PASSWORD =
''
EG_HOST =
'mysql.ebi.ac.uk'
EG_PORT =
4157