⑁ Campchair
A fast, ghetto-tech CouchDB-like persistence layer for Ruby
This README is a wishlist. The thing doesn't entirely work yet!
Campchair provides some helpers for lazily materialized views, and gives you a choice of using ActiveRecord-like models or class-based, barebones data-structure access.
Views are described a bit like those found in CouchDB. Campchair is not a service and has no client-server stuff. Parallel access is not supported yet, but that's okay – you can use something else for concurrency and use this in your persistence fiber/thread/process.
Persistence happens through LevelDB. Objects are serialized into documents using Ruby's Marshal class. This is not append-only and replication is not a feature.
This project was born at Railscamp 11.
How does it look?
For these examples we'll work with two documents.
fred = {
:name => 'Fred', :height => 1.748,
:location => 'Flinders St Station, Melbourne, Australia',
:first_seen => Time.new(2012, 3, 6, 12, 10),
:last_seen => Time.new(2012, 3, 6, 12, 40)
}
jane = {
:name => 'Jane', :height => 1.634,
:location => 'Flinders St Station, Melbourne, Australia',
:first_seen => Time.new(2012, 3, 6, 12, 20),
:last_seen => Time.new(2012, 3, 6, 12, 50)
}
Mixing in Campchair DB behavior to a class is as simple as...
class People
include Campchair::Store
end
Now, use it like a Hash
.
# Add a document to the db
key = People << fred
# Add another person
People << jane
# Rerieve a document from the db
fred = People[key]
# Delete the document from the db
People.delete(key)
# Update/create the document in the db using a key
fred[:last_seen] = Time.new(2012, 3, 6, 12, 40, 25)
doc[key] = fred
Basic views not yet implemented
For views to be available on classes we need to include Campchair::Views
.
class People # add some views to the people class
include Campchair::Views
# Map to document by id
view :all do
map { |id, person| emit id, person }
end
# Map to name by doc id
view :names do
map { |id, person| emit id, person.name }
end
end
People.names
# ['Fred', 'Jane']
Using reduce not yet implemented
Reduce gets called on groups of map results.
class People
# Map to count by doc id
# Reduce to sum of counts
view :count do
map { |id, person| emit id, 1 }
reduce do |keys, values|
values.inject(0) { |memo, values| memo + values }
end
end
end
People.count
# 2
Reduce happens in stages. Supply a rereduce step to handle reducing reduce results. The advantage of rereduce is results can be cached in a b-tree index.
If rereduce
is supplied then reduce
is called with a portion of the map
results and those reductions are passed to rereduce
. If rereduce
is
omitted then reduce
results will not be cached.
class People
# Map to location by doc id
# Reduce by unique location
view :unique_locations do
map { |id, person| emit id, person.location }
reduce do |keys, values|
results = values.uniq
end
rereduce do |results|
results.inject(Set.new) { |memo, values| memo.merge values }
end
end
end
People.unique_locations
# <Set: {"Flinders St Station, Melbourne, Australia"}>
Keep in mind that if rereduce
is not supplied, reduce
will always get
called with the entire keyspace. If the dataset is large this might consume a
lot of memory.
Using keys with reduce not yet implemented
class People
# Map to count by name
# Reduce by sum of counts
view :count_by_name do
map { |id, person| emit name, 1 }
reduce do |keys, values|
values.inject(0) { |memo, value| memo + value }
end
rereduce do |results|
results.inject(0) { |memo, value| memo + value }
end
end
end
People.count_by_name['Jane']
# 1
People.count_by_name
# 2
Controlling the index not yet relevant
It's possible to write reduce methods that return results larger than the
input. If you're doing this, you're gonna have a bad time. Unless you're sure
the dataset will remain small enough that the index for each rereduce
doesn't
blow out your disk store.
class People
# DANGER: this view will blow out the index
# Map to count by name
# Reduce by sum of counts by name
view :count_all_by_name do
map { |id, person| emit person.name, 1 }
reduce do |keys, values|
result = Hash.new
keys.each_with_index do |key, index|
result[key] ||= 0
result[key] += values[index]
end
result # scary, nearly as large as the input
end
rereduce(:cache => false) do |results|
result = {}
results.each do |name, count|
result[name] ||= 0
result[name] += count
end
result # likewise, this will explode the index
end
end
end
People.count_all_by_name['Jane']
# { 'Jane' => 1 }
People.count_all_by_name
# { 'Fred' => 1, 'Jane' => 1 }
Re-re-reducing not yet implemented
Sometimes it's useful to post-process reduce results. Add another rereduce to process rereduced values. Only the first rereduce gets called on values it produces itself. Subsequent rereduces are chained to the initial result.
clas People
# Map to height by location
# Reduce by count and sum of heights
# Rereduce by the calculating average
view :average_height_by_location do
map { |id, person| emit location, height }
reduce do |keys, values|
result = Hash.new
keys.each_with_index do |key, index|
result[key] ||= { :count => 0, :sum => 0 }
result[key][:count] += 1
result[key][:sum] += height
end
result
end
rereduce do |results|
result = { :count => 0, :sum => 0 }
results.each do |result|
result[:count] += part[:count]
result[:sum] += part[:sum]
end
result
end
rereduce do |results|
count = results.inject(0) { |memo, result| result[:count] }
sum = results.inject(0) { |memo, result| result[:sum] }
count == 0 ? nil : sum / count
end
end
end
People.average_height_by_location
# 1.690999999
People.average_height_by_location['Flinders St Station, Melbourne, Australia']
# 1.690999999
People.average_height_by_location['Docklands, Melbourne, Australia']
# nil
Custom database path
By default, the folder for db files is 'cddb'. You can change this with:
Campchair.db_path = 'db/heavy_metrics'
You can also change the db on a per-class basis.
class Person
include Campchair::LevelDB
self.db_path = 'db/heavy_metrics/Person'
end
TODO
- Make the README examples work
- Views spanning views
- Caching view results
- Caching reduce results
- method_missing for
count_by_*, sum_of_*s, min_*, max_*, unique_*s
- cache priming