The “google_hash” gem.

Its goal. To boldly be faster than any ruby hash has before (cue star trek TNG theme…).

Well, really the goal is a better hash for Ruby, either one that is faster or more space efficient than ruby’s default.

To attempt to accomplish this, this library wraps the google hash sparse and dense hashes [1], which may perform better for your use case [make sure to benchmark before and after!].

It also creates some “specialized” hashes, for instance, those that take an integer for their key, for even better performance and decreased “garbage collected” RAM use, which can significantely speed up certain apps.

# how to run this benchmark:

$ cd ext && ruby extconf.rb && make $ cd ../spec $ ruby benchmark.rb

ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux] inserting 400000 objects

Ruby Standard Hash populate integer 0.324 #each 0.660 lookup int 0.083

GoogleHashDenseIntToRuby populate integer 0.114 #each 0.050 lookup int 0.080

GoogleHashSparseIntToInt # said to be more memory efficient populate integer 0.242 #each 0.046 lookup int 0.099

These also use less memory, because (if you specify IntToInt, it stores only 4 bytes per int, instead of Ruby’s usual 20 bytes), and usually less overall RAM for the hash store. This also frees up Ruby so it doesn’t hvae to garbage collect as much. Yea!

For instance, with 1M ints, doing a GC takes this long, comparatively: GoogleHashDenseIntToInt “dense took 0.002” “ruby hash took 0.103”

per GC. And those garbage collects happen all the time, so this is meant to speed those up.

See also the results.txt file for more OS benchmark results.

You can also run your own benchmarks to see how much faster it would be for you (“try before you buy”), see spec/benchmark.rb file.

The best benchmark, of course, is to integrate and run it in your own app.

Here is how it performs, if used as a “replacement” for the Ruby standard Hash:

Ruby Standard Hash populate string 0.252 populate symbol 0.109 populate integer 0.324 #each 0.660 lookup int 0.083 lookup string 0.642 lookup symbol 0.082

GoogleHashDenseRubyToRuby populate string 0.312 # slower here populate symbol 0.136 populate integer 0.172 #each 0.077 lookup int 0.102 lookup string 0.271 lookup symbol 0.098

GoogleHashSparseRubyToRuby populate string 0.276 populate symbol 0.102 populate integer 0.428 #each 0.047 lookup int 0.098 lookup string 0.293 lookup symbol 0.099

basically slightly slower, except for an #each method that is much faster, and, as I said, RAM usage might be much better using this than the standard Ruby hash, esp. for the case of the Sparse version.

Installation ==

gem install google_hash (if on windows, you’ll also need the devkit installed).

usage ==

a = GoogleHashDenseRubyToRuby.new # like a normal Ruby hash, with slightly different performance characteristics, see results.txt b = GoogleHashDenseIntToRuby.new # :int => Ruby – keys are stored as “native” ints, values are ruby objects c = GoogleHashSparseIntToInt.new # :int => :int

Here’s the full list of availables:

>> puts Object.constants.select{|k| k =~ /google/i}.sort; nil

GoogleHashDenseLongToLong GoogleHashSparseLongToLong GoogleHashDenseLongToInt GoogleHashSparseLongToInt GoogleHashDenseLongToRuby GoogleHashSparseLongToRuby GoogleHashDenseDoubleToLong GoogleHashSparseDoubleToLong GoogleHashDenseDoubleToInt GoogleHashSparseDoubleToInt GoogleHashDenseDoubleToRuby GoogleHashSparseDoubleToRuby GoogleHashDenseIntToLong GoogleHashSparseIntToLong GoogleHashDenseIntToInt GoogleHashSparseIntToInt GoogleHashDenseIntToRuby GoogleHashSparseIntToRuby GoogleHashDenseRubyToLong GoogleHashSparseRubyToLong GoogleHashDenseRubyToInt GoogleHashSparseRubyToInt GoogleHashDenseRubyToRuby GoogleHashSparseRubyToRuby

(long is “better” than int on 64 bit systems only, on 32 bit it’s the same)

and how to use them:

a = 4 b = ‘abc’ b = ‘some complex object’ c = 4 # all you can use are ints…

Use them like “normal” hashes, the method names try to map well.

feedback ==

If you have a desired use case that’s not covered, let me know and I might well be able to code it up for you and add it.

ex: currently it uses longs internally instead of ints–if you want ints or strings added, let me know.

if you want it to remember insertion order, I could do that, too, or native “store away” strings/bignums, whatever.

Could also add vectors, vector(pairs), priority queues, floats, native bignums, other more complex types, if anybody wants me to.

This is meant to be one more tool in the rubyists toolbelt when trying to optimize speed-wise, and plans to expand to more types, but at least with this release it has a #each method.

Could add #sum methods, etc. for the numeric types, for instance.

Enjoy.

-r

1

code.google.com/p/google-sparsehash

If you want to see the code/hack on it, run extconf.rb within the ext directory, to create the code it actually uses (from a template).

Related:

judy groups.google.com/group/ruby-talk-google/browse_thread/thread/05ed587925526a7f/314375891d12b672?lnk=raot

NArray gem : provides “native” type arrays (and X-Dimensional arrays–all in native memory, so also saves memory as this gem does)