⑁ Campchair

A fast, ghetto-tech CouchDB-like persistence layer for Ruby

Build Status

This README is a wishlist. The thing doesn't entirely work yet!

Campchair provides some helpers for lazily materialized views, and gives you a choice of using ActiveRecord-like models or class-based, barebones data-structure access.

Views are described a bit like those found in CouchDB. Campchair is not a service and has no client-server stuff. Parallel access is not supported yet, but that's okay – you can use something else for concurrency and use this in your persistence fiber/thread/process.

Persistence happens through LevelDB. Objects are serialized into documents using Ruby's Marshal class. This is not append-only and replication is not a feature.

This project was born at Railscamp 11.

How does it look?

For these examples we'll work with two documents.

fred = {
  :name       => 'Fred', :height => 1.748,
  :location   => 'Flinders St Station, Melbourne, Australia',
  :first_seen => Time.new(2012, 3, 6, 12, 10),
  :last_seen  => Time.new(2012, 3, 6, 12, 40)
}
jane = {
  :name       => 'Jane', :height => 1.634,
  :location   => 'Flinders St Station, Melbourne, Australia',
  :first_seen => Time.new(2012, 3, 6, 12, 20),
  :last_seen  => Time.new(2012, 3, 6, 12, 50)
}

Mixing in Campchair DB behavior to a class is as simple as...

class People
  include Campchair::Store
end

Now, use it like a Hash.

# Add a document to the db
key = People << fred

# Add another person
People << jane

# Rerieve a document from the db
fred = People[key]

# Delete the document from the db
People.delete(key)

# Update/create the document in the db using a key
fred[:last_seen] = Time.new(2012, 3, 6, 12, 40, 25)
doc[key] = fred

Basic views not yet implemented

For views to be available on classes we need to include Campchair::Views.

class People # add some views to the people class
  include Campchair::Views

  # Map to document by id
  view :all do
    map { |id, person| emit id, person }
  end

  # Map to name by doc id
  view :names do
    map { |id, person| emit id, person.name }
  end
end

People.names
# ['Fred', 'Jane']

Using reduce not yet implemented

Reduce gets called on groups of map results.

class People
  # Map to count by doc id
  # Reduce to sum of counts
  view :count do
    map { |id, person| emit id, 1 }
    reduce do |keys, values|
      values.inject(0) { |memo, values| memo + values }
    end
  end
end

People.count
# 2

Reduce happens in stages. Supply a rereduce step to handle reducing reduce results. The advantage of rereduce is results can be cached in a b-tree index.

If rereduce is supplied then reduce is called with a portion of the map results and those reductions are passed to rereduce. If rereduce is omitted then reduce results will not be cached.

class People
  # Map to location by doc id
  # Reduce by unique location
  view :unique_locations do
    map { |id, person| emit id, person.location }
    reduce do |keys, values|
      results = values.uniq
    end
    rereduce do |results|
      results.inject(Set.new) { |memo, values| memo.merge values }
    end
  end
end

People.unique_locations
# <Set: {"Flinders St Station, Melbourne, Australia"}>

Keep in mind that if rereduce is not supplied, reduce will always get called with the entire keyspace. If the dataset is large this might consume a lot of memory.

Using keys with reduce not yet implemented

class People
  # Map to count by name
  # Reduce by sum of counts
  view :count_by_name do
    map { |id, person| emit name, 1 }
    reduce do |keys, values|
      values.inject(0) { |memo, value| memo + value }
    end
    rereduce do |results|
      results.inject(0) { |memo, value| memo + value }
    end
  end
end

People.count_by_name['Jane']
# 1
People.count_by_name
# 2

Controlling the index not yet relevant

It's possible to write reduce methods that return results larger than the input. If you're doing this, you're gonna have a bad time. Unless you're sure the dataset will remain small enough that the index for each rereduce doesn't blow out your disk store.

class People
  # DANGER: this view will blow out the index

  # Map to count by name
  # Reduce by sum of counts by name
  view :count_all_by_name do
    map { |id, person| emit person.name, 1 }
    reduce do |keys, values|
      result = Hash.new
      keys.each_with_index do |key, index|
        result[key] ||= 0
        result[key] += values[index]
      end
      result # scary, nearly as large as the input
    end
    rereduce(:cache => false) do |results|
      result = {}
      results.each do |name, count|
        result[name] ||= 0
        result[name] += count
      end
      result # likewise, this will explode the index
    end
  end
end

People.count_all_by_name['Jane']
# { 'Jane' => 1 }
People.count_all_by_name
# { 'Fred' => 1, 'Jane' => 1 }

Re-re-reducing not yet implemented

Sometimes it's useful to post-process reduce results. Add another rereduce to process rereduced values. Only the first rereduce gets called on values it produces itself. Subsequent rereduces are chained to the initial result.

clas People
  # Map to height by location
  # Reduce by count and sum of heights
  # Rereduce by the calculating average
  view :average_height_by_location do
    map { |id, person| emit location, height }
    reduce do |keys, values|
      result = Hash.new
      keys.each_with_index do |key, index|
        result[key] ||= { :count => 0, :sum => 0 }
        result[key][:count] += 1
        result[key][:sum] += height
      end
      result
    end
    rereduce do |results|
      result = { :count => 0, :sum => 0 }
      results.each do |result|
        result[:count] += part[:count]
        result[:sum] += part[:sum]
      end
      result
    end
    rereduce do |results|
      count = results.inject(0) { |memo, result| result[:count] }
      sum = results.inject(0) { |memo, result| result[:sum] }
      count == 0 ? nil : sum / count
    end
  end
end

People.average_height_by_location
# 1.690999999
People.average_height_by_location['Flinders St Station, Melbourne, Australia']
# 1.690999999
People.average_height_by_location['Docklands, Melbourne, Australia']
# nil

Custom database path

By default, the folder for db files is 'cddb'. You can change this with:

Campchair.db_path = 'db/heavy_metrics'

You can also change the db on a per-class basis.

class Person
  include Campchair::LevelDB
  self.db_path = 'db/heavy_metrics/Person'
end

TODO

  • Make the README examples work
  • Views spanning views
  • Caching view results
  • Caching reduce results
  • method_missing for count_by_*, sum_of_*s, min_*, max_*, unique_*s
  • cache priming