ocean-dynamo
OceanDynamo is a massively scalable Amazon DynamoDB near drop-in replacement for ActiveRecord.
OceanDynamo requires Ruby 2.0 and Ruby on Rails 4.0.0 or later.
<img src=“https://badge.fury.io/rb/ocean-dynamo.png” alt=“Gem Version” />
Features
As one important use case for OceanDynamo is to facilitate the conversion of SQL databases to no-SQL DynamoDB databases, it is important that the syntax and semantics of OceanDynamo are as close as possible to those of ActiveRecord. This includes callbacks, exceptions and method chaining semantics. OceanDynamo follows this pattern closely and is of course based on ActiveModel.
The attribute and persistence layer of OceanDynamo is modeled on that of ActiveRecord: there’s save
, save!
, create
, update
, update!
, update_attributes
, find_each
, destroy_all
, delete_all
and all the other methods you’re used to. The design goal is always to implement as much of the ActiveRecord interface as possible, without compromising scalability. This makes the task of switching from SQL to no-SQL much easier.
Thanks to its structural similarity to ActiveRecord, OceanDynamo works with FactoryGirl.
OceanDynamo uses only primary indices to retrieve related table items and collections, which means it will scale without limits.
Example
Basic syntax
The following example shows the basic syntax for declaring a DynamoDB-based schema.
class AsyncJob < OceanDynamo::Table
dynamo_schema(:uuid) do
attribute :credentials, :string
attribute :token, :string, default: Proc { SecureRandom.uuid }
attribute :steps, :serialized, default: []
attribute :max_seconds_in_queue, :integer, default: 1.day
attribute :default_poison_limit, :integer, default: 5
attribute :default_step_time, :integer, default: 30
attribute :started_at, :datetime
attribute :last_completed_step, :integer
attribute :finished_at, :datetime
attribute :destroy_at, :datetime
attribute :created_by
attribute :updated_by
attribute :succeeded, :boolean, default: false
attribute :failed, :boolean, default: false
attribute :poison, :boolean, default: false
end
end
Attributes
Each attribute has a name, a type (:string
, :integer
, :float
, :datetime
, :boolean
, or :serialized
) where :string
is the default. Each attribute also optionally has a default value, which can be a Proc. The hash key attribute is by default :id
(overridden as :uuid
in the example above) and is a :string
.
The :string
, :integer
, :float
and :datetime
types can also store sets of their type. Sets are represented as arrays, may not contain duplicates and may not be empty.
All attributes except the :string
type can take the value nil
. Storing nil
for a string value will return the empty string, ""
.
Schema args and options
dynamo_schema
takes args and many options. Here’s the full syntax:
dynamo_schema(
table_hash_key = :id, # The name of the hash key attribute
table_range_key = nil, # The name of the range key attribute (or nil)
table_name: compute_table_name, # The basename of the DynamoDB table
table_name_prefix: nil, # A basename prefix string or nil
table_name_suffix: nil, # A basename suffix string or nil
read_capacity_units: 10, # Used only when creating a table
write_capacity_units: 5, # Used only when creating a table
connect: :late, # true, :late, nil/false
create: false, # If true, create the table if nonexistent
locking: :lock_version, # The name of the lock attribute or nil/false
timestamps: [:created_at, :updated_at] # A two-element array of timestamp columns, or nil/false
) do
# Attribute definitions
...
...
end
has_many
and belongs_to
Example
The following example shows how to set up has_many
/ belongs_to
relations:
class Forum < OceanDynamo::Table
dynamo_schema do
attribute :name
attribute :description
end
has_many :topics, dependent: :destroy
end
class Topic < OceanDynamo::Table
dynamo_schema(:uuid) do
attribute :title
end
belongs_to :forum
has_many :posts, dependent: :destroy
end
class Post < OceanDynamo::Table
dynamo_schema(:uuid) do
attribute :body
end
belongs_to :topic, composite_key: true
end
The only non-standard aspect of the above is composite_key: true
, which is required as the Topic class itself has a belongs_to
relation and thus has a composite key. This must be declared in the child class as it needs to know how to retrieve its parent.
Restrictions
Restrictions for belongs_to
tables:
-
The hash key must be specified and must not be
:id
. -
The range key must not be specified at all.
-
belongs_to
can be specified only once in each class. -
belongs_to
must be placed after thedynamo_schema
attribute block.
Restrictions for has_many
tables:
-
has_many
must be placed after thedynamo_schema
attribute block.
These restrictions allow OceanDynamo to implement the has_many
/ belongs_to
relation in a very efficient and massively scalable way.
Implementation
belongs_to
claims the range key and uses it to store its own UUID, which normally would be stored in the hash key attribute. Instead, the hash key attribute holds the UUID of the parent. We have thus reversed the roles of these two fields. As a result, all children have their parent UUID as their hash key, and their own UUID in their range key.
This type of relation is even more efficient than its ActiveRecord counterpart as it uses only primary indices in both directions of the has_many
/ belongs_to
association. No scans.
Furthermore, since DynamoDB has powerful primary index searches involving substrings and matching, the fact that the range key is a string can be used to implement wildcard matching of additional attributes. This gives, amongst other things, the equivalent of an SQL GROUP BY request, again without requiring any secondary indices.
It’s our goal to use a similar technique to implement has_and_belongs_to_many
relations, which means that secondary indices won’t be necessary for the vast majority of DynamoDB tables. This ultimately means reduced operational costs, as well as reduced complexity.
Current State
OceanDynamo is fully usable as an ActiveModel and can be used by Rails controllers. OceanDynamo implements much of the infrastructure of ActiveRecord; for instance, read_attribute
, write_attribute
, and much of the control logic and internal organisation.
-
belongs_to :thingy
now defines.build_thingy
and.create_thingy
. -
Work begun on collection proxies, etc.
Future milestones
-
Association proxies, to implement ActiveRecord-style method chaining, e.g.:
blog_entry.comments.build(body: "Cool!").save!
-
The
has_and_belongs_to_many
assocation. -
A generator to install the
config/aws.yml
file.
Current use
OceanDynamo is currently used in the Ocean framework (wiki.oceanframework.net) e.g. to implement highly scalable job queues. It will be used increasingly as features are added to OceanDynamo and will eventually replace all ActiveRecord tables in Ocean.
Installation
gem install ocean-dynamo
Then, locate the gem’s directory and copy
spec/dummy/config/initializers/aws.rb
to your project’s
config/initializers/aws.rb
Also copy
spec/dummy/config/aws.yml.example
to both the following locations in your project:
config/aws.yml.example
config/aws.yml
Enter your AWS credentials in the latter file. Eventually, there will be a generator to copy these files for you, but for now you need to do it manually.
You also need fake_dynamo
to run DynamoDB locally: see below for installation instructions. NB: You do not need an Amazon AWS account to run OceanDynamo locally.
Documentation
-
Ocean-dynamo gem on Rubygems: rubygems.org/gems/ocean-dynamo
-
Ocean-dynamo gem API: rubydoc.info/gems/ocean-dynamo/frames
-
Ocean-dynamo source and wiki: github.org/OceanDev/ocean-dynamo
See also Ocean, a Rails framework for creating highly scalable SOAs in the cloud, in which OceanDynamo is used as a central component:
Contributing
Contributions are welcome. Fork in the usual way. OceanDynamo is developed using TDD: the specs are extensive and test coverage is very near to 100 percent. Pull requests will not be considered unless all tests pass and coverage is equally high or higher. All contributed code must therefore also be exhaustively tested.
Running the specs
To run the specs for the OceanDynamo gem, you must first install the bundle. It will download a gem called fake_dynamo
, which runs a local, in-memory functional clone of Amazon DynamoDB. We use fake_dynamo
during development and testing.
First of all, copy the AWS configuration file from the template:
cp spec/dummy/config/aws.yml.example spec/dummy/config/aws.yml
NB: aws.yml
is excluded from source control. This allows you to enter your AWS credentials safely. Note that aws.yml.example
is under source control: don’t edit it.
Make sure your have version 0.1.3 of the fake_dynamo
gem. It implements the 2011-12-05
version of the DynamoDB API. We’re not using the 2012-08-10
version, as the aws-sdk
ruby gem doesn’t fully support it.
Next, start fake_dynamo
:
fake_dynamo --port 4567
If this returns errors, make sure that /usr/local/var/fake_dynamo
exists and is writable:
sudo mkdir -p /usr/local/var/fake_dynamo
sudo chown peterb:staff /usr/local/var/fake_dynamo
When fake_dynamo
runs normally, open another window and issue the following command:
curl -X DELETE http://localhost:4567
This will reset the fake_dynamo
database. It’s not a required operation when starting fake_dynamo
; we’re just using it here as a test that the installation works. It will be issued automatically as part of the test suite, so don’t expect test data to survive between runs.
With fake_dynamo
running, you should now be able to do
rspec
All tests should pass.
Rails console
The Rails console is available from the built-in dummy application:
cd spec/dummy
rails console
This will, amongst other things, also create the CloudModel table if it doesn’t already exist. On Amazon, this will take a little while. With fake_dynamo
, it’s practically instant.
When you leave the console, you must navigate back to the top directory (cd ../..
) in order to be able to run RSpec again.