sharded_database

Overview

This ActiveRecord plugin is the end result of a real world problem - First, how to aggregate table data from across multiple databases for the purpose of pagination / searching. Second, how to transparently have one production instance have read/write access to a set of aggregated records pulled from a global view.

The first step was solved by creating a SQL view to pool all production records together for a given databse. This SQL view resides on a central database accessible by all instances of our application. From there, we query against this view to return a set of Aggregate* records. The second step, which this plugin handles, is proxying to the original database/record for the ‘aggregate’ instance and transparently interacting with the returned aggregate record as if it were the original object.

Uses

Currently, sharded_database works by reflecting on a set of returned records, and taking the result of #sharded_connection_klass method to delegate the connection to - for each instance. The path we chose had us creating a view that would return aggregate records, with an added ‘source’ column specifying the originating database. We then rigged our #sharded_connection_klass to return an abstract connection model depending on the value. Other ideas for implementation include an after_create callback that writes the source database / ID to a centralized table.

Example and Usage

Setup

Setup your Aggregate model, inheriting from a class that establishes a connection to your global datastore.

class AggregateFoo < ActiveRecord::Base
  include ShardedDatabase::Aggregate
  source_class 'Foo'

  def sharded_connection_klass
    "Connection::#{source.classify}".constantize
  end
end

The constant returned by #sharded_connection_klass is expected to be an subclass of ActiveRecord::Base and respond to #connection. It is suggested that you use the preserve_attributes class method in the aggregate class and use that attribute to augment and/or find the constant somehow.

As of 0.2.0, the class method #source_class is required - sharded_database no longer guesses for the correct source class!

Loading Aggregate Records

Assuming that AggregateFoo is a model that is bound to your aggregate view/table, you can now #find against this model as you would any other ActiveRecord class.

AggregateFoo.all   # => [#<Foo(Connection::One) ..>, #<Foo(Connection::Two) ..>, #<Foo(Connection::One) ..>]

Updating an attribute on the first record in the above array would update the corresponding record located in the database that Connection::One connects to.

Associations

Associations are also taken into account, and any associations that are defined on a Foo model will be sourced correctly. I have only tested this one level deep, though.

Accessing Non-Proxyable AggregateFoo Objects

Easily done. Simply add a :aggregate_proxy option set to false to your finders.

AggreateFoo.all(:aggregate_proxy => false)

Working without an AggregateFoo table

In the event that you need to load a Foo object directly, and you may or may not know the appropriate connection at runtime, sharded_database is able to apply itself to an ActiveRecord model directly.

This is done by accepting a :connection key in the options hash for a #find method. The value of :connection can either be an abstract ActiveRecord class, a Proc, or a Symbol, which will attempt to call a method of the same name on the class doing the finding. The latter two options are also supplied the #find arguments, which is useful when wanting to dynamically determine a connection.

Setup

Any class that includes ShardedDatabase::Aggregate and defines a source_class will have that class be given this functionality. To add manually:

class Bar < ActiveRecord::Base
  include ShardedDatabase::ModelWithConnection
end

Usage

As mentioned above, there are three acceptable values for the :connection key. Borrowing heavily from ActionController’s callback setup, acceptable values can either be an ActiveRecord class, or a Proc or class method name (represented as a symbol) that will ultimately return an ActiveRecord class. Both the Proc and method will be passed a splat of the attributes passed to #find.

The method implementation will be demonstrated in the Example, but the other implementations are as follows:

Supplying a class

Bar.find(:first, :order => 'created_at asc', :connection => BarConnection)

Supplying a Proc

Bar.find(123, :connection => lambda { |*args| (args.first % 2) == 1 ? Connection::One : Connection::Two })

…Which would use Connection::One, as % 2 of 123 is 1.

Example

A real world situation would be something akin to Flickr. Data is sharded over N number of databases, and the criteria sent to your finder method would be used to connect to the right datastore.

Imagine you have a photo sharing application, and due to the amount of photos managed, you’re forced to shard the ‘photos’ database across 3 database servers. To do this with sharded_database, you’d do something like:

class Photo < ActiveRecord::Base

  def self.determine_database(*attrs)   # [:all, { :conditions => { :name => 'mountain' } }]
    case first_letter = attrs.last[:conditions][:name].first[0,1]  # get the first letter
    when a..f   : Connection::AToF
    when g..p   : Connection::GToP
    when q..z   : Connection::QtoZ
  end

end

Photo.all(:conditions => { :name => 'mountain' }, :connection => :determine_database)

Which would return any photo named ‘mountain’ against the database Connection::GToP is connected to.

TODO

  • Apply associations to objects found using the :connection key

  • Fix association application to regenerate a proper reflection/association

Copyright &copy; 2008, Brennan Dunn, Decisiv Inc. Released under the MIT license.