Class: BulkProcessor::RowChunker::Boundary

Inherits:
Object
  • Object
show all
Defined in:
lib/bulk_processor/row_chunker/boundary.rb

Overview

Determine the partitions that ensure all consecutive rows with the same value for boundary_column are in the same partion. The CSV must be sorted on this column to get the desired results. This class makes an attempt to keep the partion sizes equal, but obviously prioritizes the boundary column values over partition size.

Instance Method Summary collapse

Constructor Details

#initialize(num_chunks, boundary_column:) ⇒ Boundary

Returns a new instance of Boundary.



9
10
11
12
# File 'lib/bulk_processor/row_chunker/boundary.rb', line 9

def initialize(num_chunks, boundary_column:)
  @num_chunks = num_chunks
  @boundary_column = boundary_column
end

Instance Method Details

#ranges_for(csv) ⇒ Object



14
15
16
17
18
19
20
# File 'lib/bulk_processor/row_chunker/boundary.rb', line 14

def ranges_for(csv)
  @ranges ||= begin
    # Start with a balanced partition, then make adjustments from there
    chunker = Balanced.new(num_chunks)
    adjust_for_boundaries(chunker.ranges_for(csv), csv)
  end
end