Digital Trees and Sets

Trees (aka Hash)

A digital tree is an integer-integer mapping, in function close to the well known Hash. It is a little more restricted than the Hash know to ruby users, but it's functionality is exactly the same as the st_int_table that is used to implement the hash.

A digital tree encodes the key's value in the position of the tree ad thus saves memory compared to other solution.

This gem is a hand coded wrapper around the judy tree (judy.sourceforge.net), written by Doug Baskins with extreme attention to cache lines. It is there quite fast. There is a judy_bench.rb file to experiment with, and results in the wiki.

JudyHash may be used as a direct replacement for Hash when keys are Symbols or Integers.

Sets

The JudySet implements a Set, similar to the Set in Judy (again only for integers or Symbols). It is implemented as a true integer - bit mapping.

In other words, the set is not implemented with a Hash, in fact it is more the other way around. A Set is just as fast as a Tree, but much much more space efficient.

Asymptotically, for large sets of keys and specifically very dense keys, the set approaches the efficiency of a bitmap. At the same time the tree approaches an array. But for normal use the Tree takes about as much memory as the key and value together (ie 16 bytes) and the set about 4-3 bytes.

Usage

Just install gem with gem install digital_trees_and_sets or ad to Gemfile.

require "digital_trees_and_sets"

tree = JudyHash.new => #JudyHash:0x000001009d5980 1.9.3-p327 :007 > 10000000.times {|i| tree[i] = i } => 10000000 1.9.3-p327 :008 > a.length => 10000000 1.9.3-p327 :009 > a.mem_used => 86269816

ie int or sym as key and and object as value. It would be quite simple to implement the HashWithIndifferent, but that hasn't been the point up to now.

Known

The build system creates a 64 bit version of judy even if you are still on 32 (doubtful). If you patch that, do send a pull.

The functionality and test are not necessarily complete. This is a proof of concept, actually a spinoff of another project. Pulls accepted.

[email protected]