BatchLoader

Build Status Coverage Status Code Climate Downloads Latest Version

Simple tool to avoid N+1 DB queries, HTTP requests, etc.

Contents

Highlights

  • Generic utility to avoid N+1 DB queries, HTTP requests, etc.
  • Adapted Ruby implementation of battle-tested tools like Haskell Haxl, JS DataLoader, etc.
  • Parent objects don't have to know about children's requirements, batching is isolated.
  • Automatically caches previous queries.
  • Doesn't require to create custom classes.
  • Thread-safe (BatchLoader#load).
  • Has zero dependencies.
  • Works with any Ruby code, including REST APIs and GraphQL.

Usage

Why?

Let's have a look at the code with N+1 queries:

def load_posts(ids)
  Post.where(id: ids)
end

def load_users(posts)
  posts.map { |post| post.user }
end

posts = load_posts([1, 2, 3])  #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                               #      _ ↓ _
                               #    ↙   ↓   ↘
                               #   U    ↓    ↓   SELECT * FROM users WHERE id = 1
users = load_users(post)       #   ↓    U    ↓   SELECT * FROM users WHERE id = 2
                               #   ↓    ↓    U   SELECT * FROM users WHERE id = 3
                               #    ↘   ↓   ↙
                               #      ¯ ↓ ¯
users.map { |u| user.name }    #      Users

The naive approach would be to preload dependent objects on the top level:

# With ORM in basic cases
def load_posts(ids)
  Post.where(id: ids).includes(:user)
end

# But without ORM or in more complicated cases you will have to do something like:
def load_posts(ids)
  # load posts
  posts = Post.where(id: ids)
  user_ids = posts.map(&:user_id)

  # load users
  users = User.where(id: user_ids)
  user_by_id = users.each_with_object({}) { |user, memo| memo[user.id] = user }

  # map user to post
  posts.each { |post| post.user = user_by_id[post.user_id] }
end

def load_users(posts)
  posts.map { |post| post.user }
end

posts = load_posts([1, 2, 3])  #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                               #      _ ↓ _      SELECT * FROM users WHERE id IN (1, 2, 3)
                               #    ↙   ↓   ↘
                               #   U    ↓    ↓
users = load_posts(post.user)  #   ↓    U    ↓
                               #   ↓    ↓    U
                               #    ↘   ↓   ↙
                               #      ¯ ↓ ¯
users.map { |u| user.name }    #      Users

But the problem here is that load_posts now depends on the child association and knows that it has to preload the data for load_users. And it'll do it every time, even if it's not necessary. Can we do better? Sure!

Basic example

With BatchLoader we can rewrite the code above:

def load_posts(ids)
  Post.where(id: ids)
end

def load_users(posts)
  posts.map do |post|
    BatchLoader.for(post.user_id).batch do |user_ids, batch_loader|
      User.where(id: user_ids).each { |u| batch_loader.load(u.id, user) }
    end
  end
end

posts = load_posts([1, 2, 3])         #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                                      #      _ ↓ _
                                      #    ↙   ↓   ↘
                                      #   BL   ↓    ↓
users = load_users(posts)             #   ↓    BL   ↓
                                      #   ↓    ↓    BL
                                      #    ↘   ↓   ↙
                                      #      ¯ ↓ ¯
BatchLoader.sync!(users).map(&:name)  #      Users      SELECT * FROM users WHERE id IN (1, 2, 3)

As we can see, batching is isolated and described right in a place where it's needed.

How it works

In general, BatchLoader returns a lazy object. In other programming languages it usually called Promise, but I personally prefer to call it lazy, since Ruby already uses the name in standard library :) Each lazy object knows which data it needs to load and how to batch the query. When all the lazy objects are collected it's possible to resolve them once without N+1 queries.

So, when we call BatchLoader.for we pass an item (user_id) which should be batched. For the batch method, we pass a block which uses all the collected items (user_ids):


BatchLoader.for(post.<b>user_id</b>).batch do |<b>user_ids</b>, batch_loader|
  ...
end

Inside the block we execute a batch query for our items (User.where). After that, all we have to do is to call load method and pass an item which was used in BatchLoader.for method (user_id) and the loaded object itself (user):


BatchLoader.for(post.<b>user_id</b>).batch do |user_ids, batch_loader|
  User.where(id: user_ids).each { |u| batch_loader.load(<b>u.id</b>, <b>user</b>) }
end

Now we can resolve all the collected BatchLoader objects:


BatchLoader.sync!(users) # => SELECT * FROM users WHERE id IN (1, 2, 3)

For more information, see the Implementation details section.

REST API example

Now imagine we have a regular Rails app with N+1 HTTP requests:

# app/models/post.rb
class Post < ApplicationRecord
  def rating
    HttpClient.request(:get, "https://example.com/ratings/#{id}")
  end
end

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def index
    posts = Post.limit(10)
    serialized_posts = posts.map { |post| {id: post.id, rating: post.rating} } # N+1 HTTP requests for each post.rating

    render json: serialized_posts
  end
end

As we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called parallel:

class Post < ApplicationRecord
  def rating_lazy
    BatchLoader.for(post).batch do |posts, batch_loader|
      Parallel.each(posts, in_threads: 10) { |post| batch_loader.load(post, post.rating) }
    end
  end

  # ...
end

BatchLoader#load is thread-safe. So, if HttpClient is also thread-safe, then with parallel gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for concurrent HTTP requests in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.

Now we can resolve all BatchLoader objects in the controller:

class PostsController < ApplicationController
  def index
    posts = Post.limit(10)
    serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }
    render json: BatchLoader.sync!(serialized_posts)
  end
end

BatchLoader caches the resolved values. To ensure that the cache is purged between requests in the app add the following middleware to your config/application.rb:

config.middleware.use BatchLoader::Middleware

See the Caching section for more information.

GraphQL example

With GraphQL using batching is particularly useful. You can't use usual techniques such as preloading associations in advance to avoid N+1 queries. Since you don't know which fields user is going to ask in a query.

Let's take a look at the simple graphql-ruby schema example:

Schema = GraphQL::Schema.define do
  query QueryType
end

QueryType = GraphQL::ObjectType.define do
  name "Query"
  field :posts, !types[PostType], resolve: ->(obj, args, ctx) { Post.all }
end

PostType = GraphQL::ObjectType.define do
  name "Post"
  field :user, !UserType, resolve: ->(post, args, ctx) { post.user } # N+1 queries
end

UserType = GraphQL::ObjectType.define do
  name "User"
  field :name, !types.String
end

If we want to execute a simple query like:

query = "
{
  posts {
    user {
      name
    }
  }
}
"
Schema.execute(query, variables: {}, context: {})

We will get N+1 queries for each post.user. To avoid this problem, all we have to do is to change the resolver to use BatchLoader:

PostType = GraphQL::ObjectType.define do
  name "Post"
  field :user, !UserType, resolve: ->(post, args, ctx) do
    BatchLoader.for(post.user_id).batch do |user_ids, batch_loader|
      User.where(id: user_ids).each { |user| batch_loader.load(user.id, user) }
    end
  end
end

And setup GraphQL with built-in lazy_resolve method:

Schema = GraphQL::Schema.define do
  query QueryType
  lazy_resolve BatchLoader, :sync
end

Caching

By default BatchLoader caches the resolved values. You can test it by running something like:

def user_lazy(id)
  BatchLoader.for(id).batch do |ids, batch_loader|
    User.where(id: ids).each { |user| batch_loader.load(user.id, user) }
  end
end

user_lazy(1)      # no request
# => <#BatchLoader>

user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
# => <#User>

user_lazy(1).sync # no request
# => <#User>

To drop the cache manually you can run:

user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
user_lazy(1).sync # no request

BatchLoader::Executor.clear_current

user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)

Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:

# calls "BatchLoader::Executor.clear_current" after each request
use BatchLoader::Middleware

In some rare cases it's useful to disable caching for BatchLoader. For example, in tests or after data mutations:

def user_lazy(id)
  BatchLoader.for(id).batch(cache: false) do |ids, batch_loader|
    # ...
  end
end

user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)
user_lazy(1).sync # SELECT * FROM users WHERE id IN (1)

Installation

Add this line to your application's Gemfile:

gem 'batch-loader'

And then execute:

$ bundle

Or install it yourself as:

$ gem install batch-loader

Implementation details

Coming soon

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Batch::Loader project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.