daru

Data Analysis in RUby

Gem Version Build Status

Introduction

daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.

daru is inspired by pandas, a very mature solution in Python.

Written in pure Ruby so should work with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2.

Features

  • Data structures:
    • Vector - A basic 1-D vector.
    • DataFrame - A 2-D spreadsheet-like structure for manipulating and storing data sets. This is daru's primary data structure.
  • Compatible with IRuby notebook, statsample, statsample-glm and statsample-timeseries.
  • Support for time series.
  • Singly and hierarchially indexed data structures.
  • Flexible and intuitive API for manipulation and analysis of data.
  • Easy plotting, statistics and arithmetic.
  • Plentiful iterators.
  • Optional speed and space optimization on MRI with NMatrix and GSL.
  • Easy splitting, aggregation and grouping of data.
  • Quickly reducing data with pivot tables for quick data summary.
  • Import and export data from and to Excel, CSV, SQL Databases and plain text files.

Notebooks

Notebooks on most use cases

Notebooks on Time series

Case Studies

Blog Posts

Time series

Documentation

Docs can be found here.

Roadmap

  • Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
  • Basic Data manipulation and analysis operations:
    • DF concat
  • Assignment of a column to a single number should set the entire column to that number.
  • Multiple column assignment with []=
  • Multiple value assignment for vectors with []=.
  • #find_max function which will evaluate a block and return the row for the value of the block is max.
  • Sort by index.
  • Statistics on DataFrame over rows.
  • Calculate percentage change.
  • Have some sample data sets for users to play around with. Should be able to load these from the code itself.
  • Sorting with missing data present.

Contributing

Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!

For details see CONTRIBUTING.

Acknowledgements

  • Google and the Ruby Science Foundation for the Google Summer of Code 2015 grant for further developing daru and integrating it with other ruby gems.
  • Thank you last.fm for making user data accessible to the public.

Copyright (c) 2015, Sameer Deshmukh All rights reserved