Module: DataTransport
- Defined in:
- lib/data_transport.rb,
lib/data_transport/data_store.rb,
lib/data_transport/record/source.rb,
lib/data_transport/data_store/file.rb,
lib/data_transport/record/destination.rb,
lib/data_transport/data_store/csv_file.rb,
lib/data_transport/data_store/active_record.rb
Defined Under Namespace
Modules: Record Classes: DataStore
Constant Summary collapse
- DEFAULT_BATCH_SIZE =
:nodoc:
100_000
Class Method Summary collapse
-
.map(input, output, options = {}, &block) ⇒ Object
Reads records from an input data source, processes them with the supplied block, and writes them to an output data source.
Class Method Details
.map(input, output, options = {}, &block) ⇒ Object
Reads records from an input data source, processes them with the supplied block, and writes them to an output data source. Accepts the following options:
- batch_size
-
Records are read from the input in batches. This option sets the number of records in a single batch. Default is 1000.
The block is passed two objects that represent the source and destination record. These objects have methods that reflect the attributes of the records. The following example reads the name
and price
attributes from input records, downcases the name, multiplies the price by 100, and writes them to the output:
# input = DataTransport::DataSource:: ...
# output = DataTransport::DataSource:: ...
DataTransport.map(input, output) do |src, dst|
dst.name = src.name.downcase
dst.price = (src.price * 100).to_i
end
The destination doesn’t necessarily have to have the same attributes as the source (or even the same number of attributes). The transformations that can be accomplished are limited only by what you can do in a block of Ruby.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/data_transport.rb', line 32 def self.map(input, output, = {}, &block) # Extract options. batch_size = .delete(:batch_size) || DEFAULT_BATCH_SIZE raise(TypeError, "batch size must be an integer") unless batch_size.is_a?(Integer) raise(RangeError, "batch size must be greater than zero") if batch_size < 1 unless .empty? raise(ArgumentError, "unrecognized options: `#{.keys.join("', `")}'") end # Run the transport. output.reset source = DataTransport::Record::Source.new destination = DataTransport::Record::Destination.new input.each_record(batch_size) do |record| source.record = record destination.reset! yield source, destination output.write_record(destination.record) end output.finalize end |