Module: DataTransport

Defined in:
lib/data_transport.rb,
lib/data_transport/data_store.rb,
lib/data_transport/record/source.rb,
lib/data_transport/data_store/file.rb,
lib/data_transport/record/destination.rb,
lib/data_transport/data_store/csv_file.rb,
lib/data_transport/data_store/active_record.rb

Defined Under Namespace

Modules: Record Classes: DataStore

Constant Summary collapse

DEFAULT_BATCH_SIZE =

:nodoc:

100_000

Class Method Summary collapse

Class Method Details

.map(input, output, options = {}, &block) ⇒ Object

Reads records from an input data source, processes them with the supplied block, and writes them to an output data source. Accepts the following options:

batch_size

Records are read from the input in batches. This option sets the number of records in a single batch. Default is 1000.

The block is passed two objects that represent the source and destination record. These objects have methods that reflect the attributes of the records. The following example reads the name and price attributes from input records, downcases the name, multiplies the price by 100, and writes them to the output:

# input  = DataTransport::DataSource:: ...
# output = DataTransport::DataSource:: ...

DataTransport.map(input, output) do |src, dst|
  dst.name  = src.name.downcase
  dst.price = (src.price * 100).to_i
end

The destination doesn’t necessarily have to have the same attributes as the source (or even the same number of attributes). The transformations that can be accomplished are limited only by what you can do in a block of Ruby.

Raises:

  • (TypeError)


32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/data_transport.rb', line 32

def self.map(input, output, options = {}, &block)
  # Extract options.
  batch_size = options.delete(:batch_size) || DEFAULT_BATCH_SIZE
  raise(TypeError, "batch size must be an integer") unless batch_size.is_a?(Integer)
  raise(RangeError, "batch size must be greater than zero") if batch_size < 1    
  unless options.empty?
    raise(ArgumentError, "unrecognized options: `#{options.keys.join("', `")}'")
  end
  # Run the transport.
  output.reset
  source = DataTransport::Record::Source.new
  destination = DataTransport::Record::Destination.new
  input.each_record(batch_size) do |record|
    source.record = record
    destination.reset!
    yield source, destination
    output.write_record(destination.record)
  end
  output.finalize
end