Eventosaurus

CircleCI Code Climate Test Coverage Issue Count Dependency Status

Enables easy asynchronous event storing and querying on DynamoDB

Installation

Add this line to your application's Gemfile:

gem 'eventosaurus'

And then execute:

$ bundle

Or install it yourself as:

$ gem install eventosaurus

Setting up a local DynamoDB server

When developing, use the local DynamoDB server. Install and kick off your local instance:

Set the following environment variables:

EVENT_ENVIRONMENT_PREFIX=localhost
AWS_ENDPOINT=http://localhost:8000

Install and kick off DynamoDB:

$ brew install dynamodb-local
$ ln -sfv /usr/local/opt/dynamodb-local/*.plist ~/Library/LaunchAgents
$ launchctl load ~/Library/LaunchAgents/homebrew.mxcl.dynamodb-local.plist

Configuration

You need to add the following initializer (ex: config/initializers/eventosaurus.rb):

Eventosaurus.configure do |config|
  config.use_sidekiq

  # ex: localhost, productions
  config.environment_prefix = ENV['EVENT_ENVIRONMENT_PREFIX']

  config.aws_access_key_id = ENV['AWS_ACCESS_KEY_ID']
  config.aws_secret_access_key = ENV['AWS_SECRET_ACCESS_KEY']
  config.aws_region = ENV['AWS_REGION']

  # optional, used for local dynamodb
  config.aws_endpoint = ENV['AWS_ENDPOINT']
end

Sidekiq Alternatives

Eventosaurus ships with both synchronous and asynchronous options. By default Eventosaurus uses sidekiq to persist data to DynamoDB asynchronously.

For synchronous persistance:

Eventosaurus.configure do |config|
   # ...
   config.use_synchronous
   # ...
end

To use your own persistence mechanism, reference the below two files:

And include it in your configuration

require 'custom_persistor'
 Eventosaurus.configure do |config|
   # ...
   config.persistor = CustomPersistor
   # ...
end

Event Representation

Every event type is represented by a class that includes Eventosaurus::Storable. Each class must do two things:

  1. define the table using the table_definition macro
  2. define the details class method, which defines the event interface

Here is an example of an event definition:

module Events
  class PhoneCall
    include Eventosaurus::Storable

    table_definition name: :phone_call, partition_key: { person_id: :n }

    def self.details( person_id:, phone_number:, last_called:)
        {
          'person_id'    => person_id
          'phone_number' => phone_number.to_s,
          'last_called'  => last_called
        }
     end
  end
end

There are some built-in attributes for your events:

  1. the gem defines the range of your partition key to be event_uuid. When writing an event, this attribute is enforced to be unique, preventing duplicate writes. See Event Duplication Prevention below.
  2. the event also stores the timestamp, which represents the time when the gem client calls .store

Building the Tables

DynamoDB must have the tables needed to run your events. Once you've written your event classes you must run a rake task to create the tables. The tables are namespaced by your environment, as defined in the environment_prefix variable mentioned above. So if you build locally, the table name might be localhost_phone_call. This will allow us to quickly get up and running on new environments. For the time being, the rake task expects your event definitions to be in app/models/events. Be sure to put them there!. Rake tasks are scoped to only work with tables that begin with your environment_prefix. This means even if staging and production point to the same dynamodb account, the drop_tables task will only drop tables from the environment specified.

rake eventosaurus:create_tables

(NOTE: in the short term, you will have to manually run this create tables task upon deploy, as well as staging environments and anyone who pulls code utilizing these tables. This is temporary and there is an outstanding task to change this.)

You may then verify the tables were created:

rake eventosaurus:list_tables

See the JSON used to create your tables:

rake eventosaurus:describe_tables

If you decide to (╯°□°)╯︵ ┻━┻

rake eventosaurus:drop_tables

Storing Data

To store data use the .store class method on your event class. Use the same signature as your details method mentioned above:

# Somewhere in your app:
 def check_for_phonecall(row)
      Events::PhoneCall.store(
          person_id: row[:person_id],
          phone_number: row[:phone_number],
          last_called: row[:last_called]
      )
 end

Querying Data

The gem gives you some dynamic methods to query your data based on your table definition. It's important to keep in mind that you are working with DynamoDB. It is not meant to be a data store that is accessed generically. It expects you to know the queries you want to run upfront. Good for us, we are storing each event type in its own table, so we can make good guesses about this. To this end, eventosaurus creates getters for the attributes you listed in your table definition:

 Events::PhoneCall.by_person_id(5)
 Events::PhoneCall.by_person_id(5).by_table_name('users').count

 # event_uuid & created_at included for free :)
 Events::PhoneCall.by_created_at('2015-01-04', 'GT')

The queries above return eventosaurus Query objects. To actually execute the query, use the run method:

Events::PhoneCall.by_person_id(5).count.run

Note:

  1. In the last example we query by created_at even though it was not listed in the table definition. This is because each table gets the created_at timestamp column as well as the event_uuid column added.
  2. The operator defaults to 'EQ' (equals) but there are many to choose from: EQ, NE, IN, LE, LT, GE, GT, CONTAINS, NOT_CONTAINS, BEGINS_WITH
  3. DynamoDB only allows a single secondary index to accompany the partition key. This means the following query will not work the way you think:
# too many secondary predicates. after one secondary index is used, the rest will be full scans on whatever comes back after the first local index.
Events::PhoneCall.by_created_at('2015-01-04', 'GT').by_table_name('users')

To sum it up: for speed, you are allowed 0||1 partition key condition and 0||1 secondary condition. No more than that.

Test mode

Test mode can be enabled by placing the following in rails_helper.rb (or equivalent):

Eventosaurus.enable_test_mode

On Choosing the proper table_definition for your event

When considering the correct partition_key, there are a few considerations. The first is to consider the predicates you will filter by. The predicates you use the most should probably become your partition_key. The second is the number of different values you expect to see in your partition. The more you have, the better. This is a complicated subject, and understanding of how DynamoDB works (partion keys, local and global secondary indexes) should be understood before creating an event. Here you can find more detail about best practices, and of course hopefully a co-worker Near You can help too.

Event Error Handling upon calling .store

When you call .store, you will be utilizing your .details method. Sadly, sometimes you will make mistakes and the gem will raise. Happily, you can decide what to do about the errors. If your call to .store raises, the on_error class method is called with the error as an argument. Feel free to overwrite this class method in your Event class:

module Events
 class NeatEvent
   include Eventosaurus::Storable

   # ... your primary event code here...

   def on_error(error)
     HaikuNotifier.write_haiku_with_error(error)
   end
 end
end

Event Duplication Prevention

This gem has two methods of preventing duplicate events from being written to DynamoDB, and each method addresses a different way duplication can occur.

Background on the event_uuid column

Before getting into the two methods below, the foundational piece of information is that DynamoDB writes can be configured to fail if a duplicate value is found on an attribute. The Event Gem leverages this by using an 'event_uuid' as the table's sort key, and setting our writes to fail if the event_uuid already exists.

Duplication Cause 1: Double processing of asynchronous jobs.

If the same job that sends an event to dynamodb gets run twice, we need to make sure we don't store two events. This is handled by the mechanism explained above: writes are instructed to fail if they see the event_uuid already exists.

Duplication Cause 2: Gem client erroneously calls the .store method multiple times

If your (the gem client's) code has an error, and your code calls Events::NeatEvent.store more than intended, the Event Gem can be configured to help defend and ensure only 1 event is stored. To guard against this, you can create a composite primary key, based on the fields of your choosing. This composite key is then digested and used as the basis of the event_uuid. You may use the macro compsite_primary_key to achieve this duplication defense:

module Events
 class NeatEvent

 composite_primary_key :location, :employee_name, :employee_action
end

The above example will cause a string like the following to be generated and used as a digest for the event_uuid:

 location=gardens:employee_name=Markus:employee_action=ate grapes

Even if you call Events::NeatEvent.store(args) multiple times with the same args, only one event will be created.

If you do not opt to use the composite_primary_key feature, the Event Gem will use SecureRandom.uuid to generate the uuid, which has a much less likely chance of collision than you winning the lottery (It follows RFC 4122)

Do not add the created_at attr to your list of composite_primary_key attrs

Using a timestamp representing the creation-time of the event (aka the created_at attr) in the composite_primary_key is not advisable, as accidental duplicate events might have slightly different timestamps, and thus slightly different UUIDs. Put simply: do not add the created_at attr to your list of composite_primary_key attrs.

Note that any attribute listed in the composite_primary_key macro promotes that attribute to a required attribute.

Using the AWS SDK (V2) Client Directly

To access the Aws SDK v2 client directly (for educational purposes only), access via Eventosaurus.configuration.dynamodb_client. For example:

Eventosaurus.configuration.dynamodb_client.list_tables

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

Releasing

Version bumps should be done straight in master after appropriate PRs are merged.

Gem release best practices here

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/blueapron/eventosaurus.

License

The gem is Copyright 2015 Blue Apron, Inc.