Module: EachBatched

Defined in:
lib/each_batched.rb,
lib/each_batched/version.rb

Overview

More grouping/batching logic options than what’s included in Rails.

Constant Summary collapse

DEFAULT_BATCH_SIZE =

Default batch size to use, if none is specified (defaults to 1000)

1_000
VERSION =
"0.1.0"

Instance Method Summary collapse

Instance Method Details

#batches_by_ids(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Yields batches of records from the current scope Snapshots the primary key ids in scope, then loops through grabbing the rows, one chunk of ids at a time.

  • You should explicitly set an order if you want the same order as #batches_by_range, or it may be different.

  • The yielded scope can be lazily loaded (though the id selection query has already run obviously)



48
49
50
51
52
53
54
# File 'lib/each_batched.rb', line 48

def batches_by_ids(batch_size=DEFAULT_BATCH_SIZE)
  reduced_scope = scoped.tap { |s| s.where_values = [] }.offset(nil).limit(nil)
  select("#{table_name}.#{primary_key}").collect(&(primary_key.to_sym)).in_groups_of(batch_size, false) do |group_ids|
    # keeps select/group/joins/includes, inside inner batched scope
    yield reduced_scope.where(primary_key => group_ids)
  end
end

#batches_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Yields batches of records from the current scope. Uses offset/limit internally to run through each batch, and can be further restricted by in-scope offset/limit/order (it doesn’t just toss them out!).

  • This algorithm does NOT work well with data that may have inserts/deletes while you’re looping, so if that’s a problem, then you should either lock the table or rows first or use a different algorithm (like ActiveRecord::Batches#find_in_batches or #batches_by_ids).

  • This algorithm may be slower than #batches_by_ids if your query doesn’t execute very quickly.

  • This algorithm can’t be lazily loaded, because it checks for empty results to see when it’s done.



19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# File 'lib/each_batched.rb', line 19

def batches_by_range(batch_size=DEFAULT_BATCH_SIZE)
  start_offset = scoped.offset_value || 0
  end_limit = scoped.limit_value # || nil
  group_number = 0
  processed_number = 0
  # This giant while condition (with multiple assignments in it) is a mess, isn't it!
  # But simplifying it means I have to repeat most of it multiple times!
  # And putting it into a subroutine doesn't really save space either, with lots of parameters and/or return values!
  while (length = (records = offset(start_offset + batch_size * group_number).
      limit(asked_limit = end_limit.nil? || processed_number + batch_size < end_limit ?
        batch_size : end_limit - processed_number)).length) > 0
    yield records
    processed_number += length
    break if length < asked_limit || (! end_limit.nil? && processed_number >= end_limit)
    group_number += 1
  end
end

#each_by_ids(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Loops through each individual row found by #batches_by_ids, instead of each batch see #batches_by_ids for an explanation of its algorithm



58
59
60
# File 'lib/each_batched.rb', line 58

def each_by_ids(batch_size=DEFAULT_BATCH_SIZE)
  batches_by_ids(batch_size) { |batch| batch.each { |row| yield row } }
end

#each_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Loops through each individual row found by #batches_by_range, instead of each batch see #batches_by_range for an explanation of its algorithm



39
40
41
# File 'lib/each_batched.rb', line 39

def each_by_range(batch_size=DEFAULT_BATCH_SIZE)
  batches_by_range(batch_size) { |batch| batch.each { |row| yield row } }
end