datastax_rails

A Ruby-on-Rails interface to Datastax Enterprise (specifically DSE Search nodes). Replaces the majority of ActiveRecord functionality.

This gem is based heavily on the excellent CassandraObject gem (github.com/data-axle/cassandra_object) as well as some work I initially did in the form of SolandraObject (github.com/jasonmk/solandra_object). We made the decision to move away from Solandra and to Datastax Enterprise, thus datastax_rails was born.

Significant changes from SolandraObject:

  • Cassandra communication is now entirely CQL-based

  • Solr communication is now handled directly via RSolr

  • Bifurcation of data is no longer necessary as DSE writes data to SOLR automatically

Usage Note

Before using this gem, you should probably take a strong look at the type of problem you are trying to solve. Cassandra is primarily designed as a solution to Big Data problems. This gem is not. You will notice that it still carries a lot of relational logic with it.

We are using DSE to solve a replication problem more so than a Big Data problem. That’s not to say that this gem won’t work for Big Data problems, it just might not be ideal. You’ve been warned…

Getting started

First add it to your Gemfile:

gem 'datastax_rails'

Configure the config/datastax.yml file:

development:
  servers: ["127.0.0.1"]
  port: 9042
  keyspace: "<my_app>_development"
  strategy_class: "org.apache.cassandra.locator.NetworkTopologyStrategy"
  strategy_options: {"DC1": "1"} 
  connection_options:
    timeout: 10
  solr:
    port: 8983
    path: /solr

The above is configured to use NetworkTopologyStrategy. If you go with this, you’ll need to configure Datastax to use the NetworkTopologySnitch and set up the cassandra-topology.properties file. See the Datastax documentation for more information.

For a more simple, single datacenter setup, something like this should probably work:

development:
  servers: ["127.0.0.1"]
  port: 9042
  keyspace: "datastax_rails_development"
  strategy_class: "org.apache.cassandra.locator.SimpleStrategy"
  strategy_options: {"replication_factor": "1"}
  connection_options:
    timeout: 10
  solr:
    port: 8983
    path: /solr

See DatastaxRails::Connection::ClassMethods for a description of what options are available.

Create your keyspace:

rake ds:create
rake ds:create RAILS_ENV=test

Once you’ve created some models, the following will upload the solr schemas and create the column families:

rake ds:migrate
rake ds:migrate RAILS_ENV=test

It is safe to run ds:migrate over and over. In fact, it is necessary to re-run it any time you change the attributes on any model. DSR will only upload schema files if they have changed.

Create a sample Model. See Base documentation for more details:

class Person < DatastaxRails::Base
  uuid    :id
  string  :first_name
  string  :user_name
  text    :bio
  date    :birthdate
  boolean :active
  timestamps
end

Different types of models

In addition to the simple model above, there are several special case models that take advantage of special features of Cassandra. For examples of using those, see the following classes:

Known issues

When doing a grouped query, the total groups reported is only the total groups included in your query, not the total matching groups (if you’re paginating).

This is due to the way solr sharding works with grouped queries.

More information

The documentation for DatastaxRails::Base and DatastaxRails::SearchMethods will give you quite a few examples of things you can do. You can find a copy of the latest documentation on line at rdoc.info/github/jasonmk/datastax_rails/master/frames.