The “google_hash” gem.

Its goal. To boldly be faster than any hash hash before (cue star trek TNG theme).

Well, really the goal is a better hash for Ruby, either one that is faster or more space efficient than ruby’s default.

To attempt to accomplish this, we wrap the google sparse and dense hashes [1] and see how they perform.

Time results (IntToInt hash, populating 500000 integers):

1.9.1p376 (mingw):

Hash (Ruby default) 0.359375 (populate) 1.1875 (each)

GoogleHashDenseIntToInt 0.1875 (populate) 0.078125 (each)

GoogleHashSparseIntToInt 0.53125 (populate) 0.078125 (each)

These also use significantly less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby’s usual 20 bytes). This also frees up Ruby so it doesn’t hvae to garbage collect as much. Yea!

See also the results.txt file

Usage:

a = GoogleHashDenseRubyToRuby.new # like a normal Ruby hash, with slightly different performance characteristics, see benchmarks.txt b = GoogleHashDenseIntToRuby.new # :int => Ruby – keys are stored as “native” ints, values are ruby objects c = GoogleHashSparseIntToInt.new # :int => :int

Here’s the full list of availables:

>> Object.constants.each{|k k if k =~ /google/i }

GoogleHashDenseLongToLong GoogleHashSparseLongToLong GoogleHashDenseLongToInt GoogleHashSparseLongToInt GoogleHashDenseLongToRuby GoogleHashSparseLongToRuby GoogleHashDenseDoubleToLong GoogleHashSparseDoubleToLong GoogleHashDenseDoubleToInt GoogleHashSparseDoubleToInt GoogleHashDenseDoubleToRuby GoogleHashSparseDoubleToRuby GoogleHashDenseIntToLong GoogleHashSparseIntToLong GoogleHashDenseIntToInt GoogleHashSparseIntToInt GoogleHashDenseIntToRuby GoogleHashSparseIntToRuby GoogleHashDenseRubyToLong GoogleHashSparseRubyToLong GoogleHashDenseRubyToInt GoogleHashSparseRubyToInt GoogleHashDenseRubyToRuby GoogleHashSparseRubyToRuby

(long is “better” than int on 64 bit systems only)

and how to use them:

a = 4 b = ‘abc’ b = ‘some complex object’ c = 4 # all you can use are ints…

Use them like “normal” hashes with the same method names.

To learn if sparse or dense is right for you, check the documentation:

google-sparsehash.googlecode.com/svn/trunk/doc/index.html Dense is faster, sparse uses less memory.

Installation:

gem install google_hash (if on doze, you’ll also first need the devkit installed). You may need rubygems >= 1.3.6

Note that if you use a type other than “Ruby” that you will be saving the values away as native values, so they will be using less space than a typical ruby object (which is 20 bytes), using 4 bytes instead of 20 (or instead of 40, in the case of 64 bit machines). In addition to taking less memory, they are also stored separate from Ruby’s heap, meaning that they should release stress on the GC, if you are spending much time in GC.

If you have a desired use case that’s not covered, let me know and I might well be able to code it up for you and add it.

ex: currently it uses longs internally instead of ints–if you want ints or chars or what not, let me know.

if you want it to remember insertion order, I could do that, too, or native “store away” strings/bignums, whatever.

Could also add vectors, vector(pairs), priority queues, floats, native bignums, other more complex types, if anybody wants me to.

This is meant to be one more tool in the rubyists toolbelt when trying to optimize speed-wise, and plans to expand to more types, but at least with this release it has a #each method.

Enjoy.

-r

1

code.google.com/p/google-sparsehash

If you want to see the code/hack on it, run extconf.rb within the ext directory, to create the code it actually uses (from a template).

Related:

judy groups.google.com/group/ruby-talk-google/browse_thread/thread/05ed587925526a7f/314375891d12b672?lnk=raot

NArray gem : provides “native” type arrays (and X-Dimensional arrays–all in native memory, so also saves memory as this gem does)