ruby-druid
A ruby client for druid.
ruby-druid generates complete JSON queries by chaining methods. The resulting JSON can be send directly to a druid server or handled seperatly.
bin/dripl
ruby-druid now includes a repl:
$ bin/dripl
>> metrics
[
[0] "actions"
]
>> dimensions
[
[0] "actions"
]
>> long_sum(:actions)
+---------+
| actions |
+---------+
| 98575 |
+---------+
>> long_sum(:actions)[-7.days].granularity(:day)
+-------------------------------+----------+
| timestamp | actions |
+-------------------------------+----------+
| 2013-03-28T00:00:00.000+01:00 | 93371 |
| 2013-03-29T00:00:00.000+01:00 | 448200 |
| 2013-03-30T00:00:00.000+01:00 | 117167 |
| 2013-03-31T00:00:00.000+01:00 | 828321 |
| 2013-04-01T00:00:00.000+02:00 | 261578 |
| 2013-04-02T00:00:00.000+02:00 | 05149 |
| 2013-04-03T00:00:00.000+02:00 | 27512 |
| 2013-04-04T00:00:00.000+02:00 | 18897 |
+-------------------------------+----------+
>> long_sum(:actions)[-7.days].granularity(:day).properties
{
:dataSource => "events",
:granularity => {
:type => "period",
:period => "P1D",
:timeZone => "Europe/Berlin"
},
:intervals => [
[0] "2013-03-28T00:00:00+01:00/2013-04-04T11:57:20+02:00"
],
:queryType => :groupBy,
:aggregations => [
[0] {
:type => "longSum",
:name => :actions,
:fieldName => :actions
}
]
}
Getting started
In your Gemfile:
gem 'ruby-druid'
In your code:
require 'druid'
Usage
Druid::Client.new('zk1:2181,zk2:2181/druid').query('service/source')
returns a query object on which all other methods can be called to create a full and valid druid query.
A query object can be sent like this:
Druid::Client.new('zk1:2181,zk2:2181/druid').query('service/source').send
#or
client = Druid::Client.new('zk1:2181,zk2:2181/druid')
query = Druid::Query.new('service/source')
client.send(query)
The send
method returns the parsed response from the druid server as an array.
If the response is not empty it contains one ResponseRow
object for each row.
The timestamp by can be received by a method with the same name (i.e. row.timestamp
),
all row values by hashlike syntax (i.e. `row['dimension'])
group_by
Sets the dimensions to group the data.
queryType
is set automatically to groupBy
.
Druid::Query.new('service/source').group_by([:dimension1, :dimension2])
long_sum
Druid::Query.new('service/source').long_sum([:aggregate1, :aggregate2])
postagg
A simple syntax for post aggregations with +,-,/,* can be used like:
query = Druid::Query.new('service/source').long_sum([:aggregate1, :aggregate2])
query.postagg{(aggregate2 + aggregate2).as output_field_name}
Required fields for the postaggregation are fetched automatically by the library.
interval
The interval for the query takes a string with date and time or objects that provide a iso8601
method
query = Druid::Query.new('service/source').long_sum(:aggregate1)
query.interval("2013-01-01T00", Time.now)
granularity
granularity can be :all
, :none
, :minute
, :fifteen_minute
, :thirthy_minute
, :hour
or :day
.
It can also be a period granularity as described in https://github.com/metamx/druid/wiki/Granularities.
The period 'day'
or :day
will be interpreted as 'P1D'
.
If a period granularity is specifed, the (optional) second parameter is a time zone. It defaults to the machines local time zone.
I.E:
query = Druid::Query.new('service/source').long_sum(:aggregate1)
query.granularity(:day)
is (on my box) the same as
query = Druid::Query.new('service/source').long_sum(:aggregate1)
query.granularity('P1D', 'Europe/Berlin')
having (for metrics)
having >
Druid::Query.new('service/source').having{metric > 10}
having <
Druid::Query.new('service/source').having{metric < 10}
filter (for dimensions)
Filters are set by the filter
method. It takes a block or a hash as parameter.
Filters can be chained filter{...}.filter{...}
filter == , eq
Druid::Query.new('service/source').filter{dimension.eq 1}
#this is the same as
Druid::Query.new('service/source').filter{dimension == 1}
filter != , neq
Druid::Query.new('service/source').filter{dimension.neq 1}
#this is the same as
Druid::Query.new('service/source').filter{dimension != 1}
filter and
a logical or than can combine all other filters
Druid::Query.new('service/source').filter{dimension.neq 1 & dimension2.neq 2}
filter or
a logical or than can combine all other filters
Druid::Query.new('service/source').filter{dimension.neq 1 | dimension2.neq 2}
filter not
a logical not than can negate all other filter
Druid::Query.new('service/source').filter{!dimension.eq(1)}
filter in
This filter creates a set of equals filters in an and filter.
Druid::Query.new('service/source').filter{dimension.in(1,2,3)}
filter with hash syntax
sometimes it can be useful to use a hash syntax for filtering for example if you already get them from a list or parameterhash
Druid::Query.new('service/source').filter{dimension => 1, dimension1 =>2, dimension2 => 3}
#this is the same as
Druid::Query.new('service/source').filter{dimension.eq(1) & dimension1.eq(2) & dimension2.eq(3)}
filter >, <, >=, <=
Druid::Query.new('service/source').filter{dimension >= 1}
filter javascript
Druid::Query.new('service/source').filter{a.javascript('dimension >= 1 && dimension < 5')}
#this also the same as
Druid::Query.new('service/source').filter{(dimension >= 1) & (dimension < 5)}
Acknowledgements
Post aggregation expression parsing built with the help of Squeel.
Contributions
ruby-druid is developed by madvertise Mobile Advertising GmbH
You can support us on different ways:
- Use ruby-druid, and let us know if you encounter anything that's broken or missing. A failing spec is great. A pull request with your fix is even better!
- Spread the word about ruby-druid on Twitter, Facebook, and elsewhere.
- Work with us at madvertise on awesome stuff like this. Read the job description and send a mail to [email protected].