Statlysis

Statistical & Analysis in Ruby DSL

Usage

setup

Statlysis.setup do
  set_database :statlysis

  hourly :time_column => :t
  [EoeLog,
   EoeLog.where(:ui => 0),
   EoeLog.where(:ui => {"$ne" => 0}),
   Mongoid[/eoe_logs_[0-9]+$/].where(:ui => {"$ne" => 0}),
   EoeLog.where(:do => {"$in" => [DOMAINS_HASH[:blog], DOMAINS_HASH[:my]]}),
  ].each do |s|
    daily s, :time_column => :t
  end
end

access

Statlysis.daily # => return daily crons
Statlysis.daily.run # => run daily crons
Statlysis.daily[/name_regexp/] # => return matched daily crons

process

[23] pry(#<Statlysis::Configuration>)> Statlysis.daily['multi'].first

Features

  • Support time column that stored as integer.

TODO

  • Admin interface
  • statistical query api in Ruby and HTTP
  • Interacting with Javascript charting library, e.g. Highcharts, D3.
  • More tests
  • Add @criteria to MultipleDataset

Statistical Process

  1. Delete invalid statistical data, e.g. data in tomorrow
  2. Count data within the specified time by the dimensions
  3. Delete overlapping data, and insert new data

FAQ

Q: Why use Sequel instead of ActiveRecord?

A: When initialize an ORM object, ActiveRecord is 3 times slower than Sequel, and we just need the basic operations, including read, write, enumerate, etc. See more details in Quick dive into Ruby ORM object initialization .

Q: Why do you recommend using multiple collections to store logs rather than a single collection, or a capped collection?

A: MongoDB can effectively reuse space freed by removing entire collections without leading to data fragmentation, see details at http://docs.mongodb.org/manual/use-cases/storing-log-data/#multiple-collections-single-database

MIT. David Chen at eoe.cn.

Projects

Articles

Event collector

Admin interface

  • http://three.kibana.org/ browser based analytics and search interface to Logstash and other timestamped data sets stored in ElasticSearch.