ocean-dynamo

OceanDynamo is a massively scalable Amazon DynamoDB near drop-in replacement for ActiveRecord.

OceanDynamo requires Ruby 2.0 and Ruby on Rails 4.0.0 or later.

<img src=“https://badge.fury.io/rb/ocean-dynamo.png” alt=“Gem Version” />

Features

As one important use case for OceanDynamo is to facilitate the conversion of SQL databases to no-SQL DynamoDB databases, it is important that the syntax and semantics of OceanDynamo are as close as possible to those of ActiveRecord. This includes callbacks, exceptions and method chaining semantics. OceanDynamo follows this pattern closely and is of course based on ActiveModel.

The attribute and persistence layer of OceanDynamo is modeled on that of ActiveRecord: there’s save, save!, create, update, update!, update_attributes, find_each, destroy_all, delete_all and all the other methods you’re used to. The design goal is always to implement as much of the ActiveRecord interface as possible, without compromising scalability. This makes the task of switching from SQL to no-SQL much easier.

Thanks to its structural similarity to ActiveRecord, OceanDynamo works with FactoryGirl.

OceanDynamo uses only primary indices to retrieve related table items and collections, which means it will scale without limits.

Example

Basic syntax

The following example shows the basic syntax for declaring a DynamoDB-based schema.

class AsyncJob < OceanDynamo::Table

  dynamo_schema(:guid) do
    attribute :credentials,          :string
    attribute :token,                :string,     default: Proc { SecureRandom.uuid }
    attribute :steps,                :serialized, default: []
    attribute :max_seconds_in_queue, :integer,    default: 1.day
    attribute :default_poison_limit, :integer,    default: 5
    attribute :default_step_time,    :integer,    default: 30
    attribute :started_at,           :datetime
    attribute :last_completed_step,  :integer
    attribute :finished_at,          :datetime
    attribute :destroy_at,           :datetime
    attribute :created_by
    attribute :updated_by
    attribute :succeeded,            :boolean,    default: false
    attribute :failed,               :boolean,    default: false
    attribute :poison,               :boolean,    default: false
  end

end

Attributes

Each attribute has a name, a type (:string, :integer, :float, :datetime, :boolean, or :serialized) where :string is the default. Each attribute also optionally has a default value, which can be a Proc. The hash key attribute is by default :id (overridden as :guid in the example above) and is a :string.

The :string, :integer, :float and :datetime types can also store sets of their type. Sets are represented as arrays, may not contain duplicates and may not be empty.

All attributes except the :string type can take the value nil. Storing nil for a string value will return the empty string, "".

Schema args and options

dynamo_schema takes args and many options. Here’s the full syntax:

dynamo_schema(
  table_hash_key: = :id,                  # The name of the hash key attribute
  table_range_key: = nil,                 # The name of the range key attribute (or nil)
  table_name: compute_table_name,         # The basename of the DynamoDB table
  table_name_prefix: nil,                 # A basename prefix string or nil
  table_name_suffix: nil,                 # A basename suffix string or nil
  read_capacity_units: 10,                # Used only when creating a table
  write_capacity_units: 5,                # Used only when creating a table
  connect: :late,                         # true, :late, nil/false
  create: false,                          # If true, create the table if nonexistent
  locking: :lock_version,                 # The name of the lock attribute or nil/false
  timestamps: [:created_at, :updated_at]  # A two-element array of timestamp columns, or nil/false
) do
  # Attribute definitions
  ...
  ...
end

has_many and belongs_to

Example

The following example shows how to set up has_many / belongs_to relations:

class Forum < OceanDynamo::Table
  dynamo_schema do
    attribute :name
    attribute :description
  end
  has_many :topics, dependent: :destroy
end

class Topic < OceanDynamo::Table
  dynamo_schema(:guid) do
    attribute :title
  end
  belongs_to :forum
  has_many :posts, dependent: :destroy
end

class Post < OceanDynamo::Table
  dynamo_schema(:guid) do
    attribute :body
  end
  belongs_to :topic, composite_key: true
end

The only non-standard aspect of the above is composite_key: true, which is required as the Topic class itself has a belongs_to relation and thus has a composite key. This must be declared in the child class as it needs to know how to retrieve its parent.

Restrictions

Restrictions for belongs_to tables:

  • The hash key must be specified and must not be :id.

  • The range key must not be specified at all.

  • belongs_to can be specified only once in each class.

  • belongs_to must be placed after the dynamo_schema attribute block.

Restrictions for has_many tables:

  • has_many must be placed after the dynamo_schema attribute block.

These restrictions allow OceanDynamo to implement the has_many / belongs_to relation in a very efficient and massively scalable way.

Implementation

belongs_to claims the range key and uses it to store its own id, which normally would be stored in the hash key attribute. Instead, the hash key attribute holds the id of the parent. We have thus reversed the roles of these two fields. As a result, all children store their parent id in the hash key, and their own id in the range key.

This type of relation is even more efficient than its ActiveRecord counterpart as it uses only primary indices in both directions of the has_many / belongs_to association. No scans.

Furthermore, since DynamoDB has powerful primary index searches involving substrings and matching, the fact that the range key is a string can be used to implement wildcard matching of additional attributes. This gives, amongst other things, the equivalent of an SQL GROUP BY request, again without requiring any secondary indices.

It’s our goal to use a similar technique to implement has_and_belongs_to_many relations, which means that secondary indices won’t be necessary for the vast majority of DynamoDB tables. This ultimately means reduced operational costs, as well as reduced complexity.

Nevertheless, as we now have switched to v2 of the DynamoDB API, we will be adding the possibility to define both local and secondary indices for Tables.

Current State

OceanDynamo is fully usable as an ActiveModel and can be used by Rails controllers. OceanDynamo implements much of the infrastructure of ActiveRecord; for instance, read_attribute, write_attribute, and much of the control logic and internal organisation.

  • Version 2 of the AWS Ruby SDK is now used. This required an internal reorganisation, but it also gives us access to local and global secondary indices.

  • Work begun on collection proxies, etc.

Future milestones

  • Association proxies, to implement ActiveRecord-style method chaining, e.g.: blog_entry.comments.build(body: "Cool!").save!

  • The has_and_belongs_to_many assocation.

  • A generator to install the config/aws.yml file.

Current use

OceanDynamo is currently used in the Ocean framework (wiki.oceanframework.net) e.g. to implement highly scalable job queues. It will be used increasingly as features are added to OceanDynamo and will eventually replace all ActiveRecord tables in Ocean.

Installation

gem install ocean-dynamo

Then, locate the gem’s directory and copy

spec/dummy/config/initializers/aws.rb

to your project’s

config/initializers/aws.rb

Also copy

spec/dummy/config/aws.yml.example

to both the following locations in your project:

config/aws.yml.example
config/aws.yml

Enter your AWS credentials in the latter file. Eventually, there will be a generator to copy these files for you, but for now you need to do it manually.

Documentation

You might also want to take a look at Ocean, a Rails framework and development pipeline for creating highly scalable HATEOAS microservice SOAs in the cloud. Ocean uses OceanDynamo as a central component:

Contributing

Contributions are welcome. Fork in the usual way. OceanDynamo is developed using TDD: the specs are extensive and test coverage is very near to 100 percent. Pull requests will not be considered unless all tests pass and coverage is equally high or higher. All contributed code must therefore also be exhaustively tested.

Running the specs

To run the specs for the OceanDynamo gem, you must first install DynamoDB Local. It’s a Java clone of Amazon DynamoDB which runs locally on your computer. We use it for development and testing.

Download DynamoDB Local from the following location: docs.aws.amazon.com/amazondynamodb/latest/developerguide/Tools.DynamoDBLocal.html

Next, copy the AWS configuration file from the template:

cp spec/dummy/config/aws.yml.example spec/dummy/config/aws.yml

NB: aws.yml should be excluded from source control. This allows you to enter your AWS credentials safely. On the other hand, aws.yml.example SHOULD be under source control. Don’t put sensitive information in it.

You’re now ready to start DynamoDB Local:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

Replace -sharedDb with -inMemory to run the DB in RAM.

With DynamoDB Local running, you should now be able to do

rspec

All tests should pass.

Cleaning up the DB

You might want to add the following to your spec_helper.rb file, before the RSpec.configure block:

# DynamoDB table cleaner
CHEF_ENV = "master" unless defined?(CHEF_ENV)
regexp = Regexp.new("^.+_#{CHEF_ENV}_[0-9]{1,3}-[0-9]{1,3}-[0-9]{1,3}-[0-9]{1,3}_test$")
cleaner = lambda { 
  c = Aws::DynamoDB::Client.new
  c.list_tables.table_names.each { |t| c.delete_table({table_name: t}) if t =~ regexp }
}

Then, inside the RSpec.configure block:

config.before(:suite) { cleaner.call }
config.after(:suite) { cleaner.call }

This will remove only those tables created by the specs on this particular machine and environment. This is safe even on AWS and for parallel testing.

Rails console

The Rails console is available from the built-in dummy application:

cd spec/dummy
rails console

This will, amongst other things, also create the CloudModel table if it doesn’t already exist. On Amazon, this will take a little while. With DynamoDB Local, it’s practically instant.

When you leave the console, you must navigate back to the top directory (cd ../..) in order to be able to run RSpec again.