Class: BulkProcessor::RowChunker::Boundary
- Inherits:
-
Object
- Object
- BulkProcessor::RowChunker::Boundary
- Defined in:
- lib/bulk_processor/row_chunker/boundary.rb
Overview
Determine the partitions that ensure all consecutive rows with the same value for boundary_column are in the same partion. The CSV must be sorted on this column to get the desired results. This class makes an attempt to keep the partion sizes equal, but obviously prioritizes the boundary column values over partition size.
Instance Method Summary collapse
-
#initialize(num_chunks, boundary_column:) ⇒ Boundary
constructor
A new instance of Boundary.
- #ranges_for(csv) ⇒ Object
Constructor Details
#initialize(num_chunks, boundary_column:) ⇒ Boundary
Returns a new instance of Boundary.
9 10 11 12 |
# File 'lib/bulk_processor/row_chunker/boundary.rb', line 9 def initialize(num_chunks, boundary_column:) @num_chunks = num_chunks @boundary_column = boundary_column end |
Instance Method Details
#ranges_for(csv) ⇒ Object
14 15 16 17 18 19 20 |
# File 'lib/bulk_processor/row_chunker/boundary.rb', line 14 def ranges_for(csv) @ranges ||= begin # Start with a balanced partition, then make adjustments from there chunker = Balanced.new(num_chunks) adjust_for_boundaries(chunker.ranges_for(csv), csv) end end |