t-digest Ruby

Ruby CI Gem Version

Ruby implementation of Ted Dunning's t-digest data structure.

Inspired by the Javascript implementation by Will Welch

Installation

Add this line to your application's Gemfile:

gem 'tdigest'

And then execute:

$ bundle

Or install it yourself as:

$ gem install tdigest

Usage

td = ::TDigest::TDigest.new
1_000.times { td.push(rand) }
td.compress!

puts td.percentile(0.5)
puts td.p_rank(0.95)

Serialization

This gem offers the same serialization options as the original Java implementation. You can read more about T-digest persistence in Chapter 3 in the paper.

Standard encoding

This encoding uses 8-byte Double for the means and a 4-byte integer for counts. Size per centroid is a fixed 12-bytes.

bytes = tdigest.as_bytes

Compressed encoding

This encoding uses delta encoding with 4-byte floats for the means and variable length encoding for the counts. Size per centroid is between 5-12 bytes.

bytes = tdigest.as_small_bytes

Deserializing

Deserialization will automatically detect compression format

 tdigest = TDigest::TDigest.from_bytes(bytes)

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/castle/tdigest.

License

The gem is available as open source under the terms of the MIT License.