Goodbye serialize, hello hstore.

You need dynamic columns in your tables. What do you do?

  • Create lots of tables to handle it. Nice, now you’ll need more models and lots of additional sqls. Insertion and selection will be slow as hell.
  • Use a noSQL database just for this issue. Good luck.
  • Create a serialized column. Nice, insertion will be fine, and reading data from a record too. But, what if you have a condition in your select that includes serialized data? Yeah, regular expressions.

action serialize hstore serialize sql hstore sql
count all with condition 10114,575 1830,444 SELECT count(*) FROM foos WHERE data ~ ‘foo: bar’; SELECT count(*) FROM bars WHERE data @> ‘foo=>bar’;
count all with negative condition 18722,149 1677,948 SELECT count(*) FROM foos WHERE data !~ ‘another key: 9999990’; SELECT count(*) FROM bars WHERE not data @> ‘“another key”=>9999990’;
find one with condition 17740,307 130,227 SELECT count(*) FROM foos WHERE data ~ ‘another key: 9999990’; SELECT * FROM bars WHERE data @> ‘“another key”=>9999990’

Benchmarks made in my local machine, with a million records. Time is in milliseconds.

Requirements

Postgresql 8.4 with contrib and Rails 3. (It might work on 2.3.x with minor patches…)

In Ubuntu, this is easy:

sudo apt-get install postgresql-8.4 postgresql-contrib-8.4

In Mac …you are screwed. Use a VM.

Install

Hstore is a postgres contrib type. Check it out first:

http://www.postgresql.org/docs/8.4/static/hstore.html

Then, just add this to your Gemfile:

gem 'activerecord-postgres-hstore'

And run your bundler:

bundle install

Now you need to create a migration that adds hstore support for your postgresql database:

rails g hstore:setup

Run it:

rake db:migrate

Finally you can create your own tables using hstore type. It’s easy:

rails g model Person name:string data:hstore rake db:migrate

You’re done.

Well, not yet. Don’t forget to add indexes. Like this:

CREATE INDEX people_gist_data ON people USING GIST(data); or CREATE INDEX people_gin_data ON people USING GIN(data);

To my experience GIN is faster for searching records.

Usage

Once you have it installed, you just need to learn a little bit of new sqls for selecting stuff (creting and updating is transparent).

Find records that contains a key named ‘foo’:

Person.where("data ? 'foo'")

Find records where ‘foo’ is equal to ‘bar’:

Person.where("data -> 'foo' = 'bar'")

This same sql is at least twice as fast (using indexes) if you do it that way:

Person.where("data @> 'foo=>bar'")

Find records where ‘foo’ is not equal to ‘bar’:

Person.where("data -> 'foo' <> 'bar'") or Person.where("not data @> 'foo=>bar'")

Find records where ‘foo’ is like ‘bar’:

Person.where("data -> 'foo' LIKE '%bar%'") or something like … Person.where("data -> 'foo' ILIKE '%bar%'")

If you need to delete a key in a record, you can do it that way:

person.destroy_key(:data, :foo)

This way you’ll also save the record:

person.destroy_key!(:data, :foo)

The destroy_key method returns ‘self’, so you can chain it:

person.destroy_key(:data, :foo).destroy_key(:data, :bar).save

But there it a shortcut for that:

person.destroy_keys(:data, :foo, :bar)

or…

person.destroy_keys!(:data, :foo, :bar)

And finally, if you need to delete keys in many rows, you can:

Person.delete_key(:data, :foo)

and with many keys:

Person.delete_keys(:data, :foo, :bar)

Have fun.

Help

You can use issues in github for that. Or else you can reach me at twitter: @joaomilho

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don’t break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright © 2010 Juan Maiz. See LICENSE for details.