Module: EachBatched

Defined in:
lib/each_batched.rb,
lib/each_batched/version.rb

Overview

More grouping/batching logic options than what’s included in Rails.

Constant Summary collapse

DEFAULT_BATCH_SIZE =

Default batch size to use, if none is specified (defaults to 1000)

1_000
VERSION =
"0.1.3"

Instance Method Summary collapse

Instance Method Details

#batches_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object

Yields batches of records from the current scope Snapshots the primary key ids in scope, then loops through grabbing the rows, one chunk of ids at a time.

  • You should explicitly set an order if you want the same order as #batches_by_range, or it may be different.

  • The yielded scope can be lazily loaded (though the id selection query has already run obviously)

  • You can optionally give it some column other than the primary key to use, as long as it’s guaranteed unique



52
53
54
55
56
57
58
59
# File 'lib/each_batched.rb', line 52

def batches_by_ids(batch_size=DEFAULT_BATCH_SIZE, key=nil)
  reduced_scope = scoped.tap { |s| s.where_values = [] }.offset(nil).limit(nil)
  key = primary_key if key.nil?
  scoped.value_of(key).in_groups_of(batch_size, false) do |group_ids|
    # keeps select/group/joins/includes, inside inner batched scope
    yield reduced_scope.where(key => group_ids), group_ids
  end
end

#batches_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Yields batches of records from the current scope. Uses offset/limit internally to run through each batch, and can be further restricted by in-scope offset/limit/order (it doesn’t just toss them out!).

  • This algorithm does NOT work well with data that may have inserts/deletes while you’re looping, so if that’s a problem, then you should either lock the table or rows first or use a different algorithm (like ActiveRecord::Batches#find_in_batches or #batches_by_ids).

  • This algorithm may be slower than #batches_by_ids if your query doesn’t execute very quickly.

  • This algorithm can’t be lazily loaded, because it checks for empty results to see when it’s done.



22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/each_batched.rb', line 22

def batches_by_range(batch_size=DEFAULT_BATCH_SIZE)
  start_offset = scoped.offset_value || 0
  end_limit = scoped.limit_value # || nil
  group_number = 0
  processed_number = 0
  # This giant while condition (with multiple assignments in it) is a mess, isn't it!
  # But simplifying it means I have to repeat most of it multiple times!
  # And putting it into a subroutine doesn't really save space either, with lots of parameters and/or return values!
  while (length = (records = offset(start_offset + batch_size * group_number).
      limit(asked_limit = end_limit.nil? || processed_number + batch_size < end_limit ?
        batch_size : end_limit - processed_number)).length) > 0
    yield records
    processed_number += length
    break if length < asked_limit || (! end_limit.nil? && processed_number >= end_limit)
    group_number += 1
  end
end

#each_by_ids(batch_size = DEFAULT_BATCH_SIZE, key = nil) ⇒ Object

Loops through each individual row found by #batches_by_ids, instead of each batch see #batches_by_ids for an explanation of its algorithm



63
64
65
# File 'lib/each_batched.rb', line 63

def each_by_ids(batch_size=DEFAULT_BATCH_SIZE, key=nil)
  batches_by_ids(batch_size, key) { |batch| batch.each { |row| yield row } }
end

#each_by_range(batch_size = DEFAULT_BATCH_SIZE) ⇒ Object

Loops through each individual row found by #batches_by_range, instead of each batch see #batches_by_range for an explanation of its algorithm



42
43
44
# File 'lib/each_batched.rb', line 42

def each_by_range(batch_size=DEFAULT_BATCH_SIZE)
  batches_by_range(batch_size) { |batch| batch.each { |row| yield row } }
end