Association Collection Tools

About

Any time you use an ORM you need to know that you are often sacrificing performance for convenience and developer efficiency. In general, this is a good thing. I agree with the theory espoused by DHH that developer productivity is often more valuable than machine performance. At least, I certainly agree with it in the early stages of development. Once you get to a certain scale, however, there are cases where you’ll need to write your own code that bypasses the ORM in the name of performance. This plugin provides some association operations that issue direct SQL calls to make things go faster.

  1. fast_copy

A method called fast_copy is added to has_and_belongs_to_many association collections that makes the process of cloning HABTM associations MUCH more efficient. Simply replace person1.items = person2.items with person1.items.fast_copy(person2) and you’re database, network and RAM will thank you. See below for more details.

  1. ids

A method called ids is added to has_many and has_and_belongs_to_many association collections. It returns the list of object ids in the association collection without unnecessarily instantiating the objects.

Installation

  1. This plugin requires that the memcache-client gem is installed. # gem install association_collection_tools

  2. Install the plugin OR the gem $ script/plugin install svn://rubyforge.org/var/svn/zventstools/projects/association_collection_tools

    • OR -

    # gem install association_collection_tools

HABTM Fast Copy

Copies a HABTM association collection from one object to another without instantiating a bunch of ActiveRecord objects. This is faster than the standard assignment operation since:

  1. Eliminates massive number of SQL calls used in standard HABTM copy by changing it from an O(n) operation to O(1) where n is the number of objects in the association collection.

  2. It transfers only object IDs back and forth between the database instead of all object attributes. Resulting in less work for the database, less data transferred and less memory used in ruby.

  3. It doesn’t instantiate ActiveRecord objects in memory.

A normal HABTM copy (e.g., person1.items = person2.items) results in the following SQL calls.

SELECT * FROM items INNER JOIN items_people ON items.id = items_people.item_id WHERE (items_people.person_id = 1 ) SELECT * FROM items INNER JOIN items_people ON items.id = items_people.item_id WHERE (items_people.person_id = 2 ) DELETE FROM items_people WHERE person_id = 2 AND item_id IN (4) INSERT INTO items_people (‘item_id`, `person_id`) VALUES (1, 2) INSERT INTO items_people (`item_id`, `person_id`) VALUES (2, 2) INSERT INTO items_people (`item_id`, `person_id`) VALUES (3, 2)

Notice that:

  • items AR objects are instantiated unnecessarily (especially since person2.items are about to be deleted)

  • 1 SQL call is issued for each object (item) in the association collection (items_people)

whereas person.items.fast_copy will result in the the following SQL calls greatly reducing the impact on the database and on ruby memory utilization.

DELETE FROM items_people WHERE person_id = 2 SELECT item_id FROM items_people WHERE person_id = 1 REPLACE INTO items_people (person_id,item_id) VALUES (2,3),(2,2),(2,1)

Here are some benchmarks:

when n = 10 and 26 objects in e2.groups:

Benchmark.bm do |x|

x.report { for i in 1..n; e1.groups.clear;e1.groups = e2.groups;end }
x.report { for i in 1..n; e1.groups.clear;e1.groups.fast_copy(e2);end }

end

user     system      total        real

1.140000 0.040000 1.180000 ( 1.832122) 0.020000 0.010000 0.030000 ( 0.125368)

when n = 100 and 26 objects in e2.groups:

user     system      total        real

11.140000 0.360000 11.500000 ( 18.171410)

0.140000   0.010000   0.150000 (  2.368200)

This method also supports HABTM join tables with additional attributes. Simply pass in an attribute hash as the second argument and it will add the attributes to the records it creates in the join table.

e.g, person1.items.fast_copy(person2, => Time.now)

REALITY CHECK: The HABTM docs refer to collection_singular_ids=ids which implies identical functionality, but I can’t find mention of this method in anything other than the documentation. Maybe this actually already exists and I’m just blind, but from the looks of dev.rubyonrails.org/ticket/2917, it appears that it is a documentation bug.

HABTM and has_many ids

Return the list of IDs in this association collection without unnecessarily instantiating a bunch of Active Record objects. What good is the id of an object without the object itself? If you think about it for a while, you’re bound to come up with many uses, especially if you write a lot of SQL by hand. For instance, the fast_copy command documented above uses this method to return an id list without instantiating AR objects. The potential savings are enormous when you’re dealing with hundreds or thousands of objects at a time.

Bugs, Code and Contributing

There.s a RubyForge project set up at:

rubyforge.org/projects/zventstools/

Anonymous SVN access:

$ svn checkout svn://rubyforge.org/var/svn/zventstools

Author: Tyler Kovacs (tyler dot kovacs at gmail dot com)