Bumblebee

Higher level languages, such as Ruby, make interacting with CSV (Comma Separated Values) files trivial. Even so, this library provides a very simple object/CSV mapper that allows you to fully interact with CSV's in a declarative way. Locking in common patterns, even in higher level languages, is important in large codebases. Using a library, such as this, will help ensure standardization around CSV interaction.

Installation

To install through Rubygems:

gem install install bumblebee

You can also add this to your Gemfile:

bundle add bumblebee

Examples

A Simple 1:1 Example

Imagine the following CSV:

id	name	dob	phone
1	Matt	1901-02-03	555-555-5555
2	Nick	1921-09-03	444-444-4444
3	Sam	1932-12-12	333-333-3333

Using the following column configuration:

columns = %i[id name dob phone]

We could parse this data and turn it into hashes:

objects = Bumblebee::Template.new(columns: columns).parse(data)

Then objects is this array of hashes:

[
  { id: '1', name: 'Matt', dob: '2/3/01',   phone: '555-555-5555' },
  { id: '2', name: 'Nick', dob: '9/3/21',   phone: '444-444-4444' },
  { id: '3', name: 'Sam',  dob: '12/12/32', phone: '333-333-3333' }
]

Note: Data, in this case, would be the read CSV file contents in string format.

Custom Headers

If our headers are not a perfect 1:1 match to our object, such as:

ID #	First Name	Date of Birth	Phone #
1	Matt	1901-02-03	555-555-5555
2	Nick	1921-09-03	444-444-4444
3	Sam	1932-12-12	333-333-3333

Then we can explicitly map those as:

columns = {
  'ID #' => :id },
  'First Name' => :name,
  'Date of Birth' => :dob,
  'Phone #' => :phone
}

Nested Objects

Let's say we have the following data which we want to create a CSV from:

objects = [
  {
    id: 1,
    name:     { first: 'Matt' },
    demo:     { dob: '1901-02-03' },
    contact:  { phone: '555-555-5555' }
  },
  {
    id: 2,
    name:     { first: 'Nick' },
    demo:     { dob: '1921-09-03' },
    contact:  { phone: '444-444-4444' }
  },
  {
    id: 3,
    name:     { first: 'Sam' },
    demo:     { dob: '1932-12-12' },
    contact:  { phone: '333-333-3333' }
  }
]

We could create a flat-file CSV:

ID #	First Name	Date of Birth	Phone #
1	Matt	1901-02-03	555-555-5555
2	Nick	1921-09-03	444-444-4444
3	Sam	1932-12-12	333-333-3333

Using the following column config:

columns = {
  'ID #' => :id,
  'First Name': {
    property: :first,
    through: :name
  },
  'Date of Birth': {
    property: :dob,
    through: :demo
  },
  'Phone #': {
    property: :phone,
    through: :contact
  }
]

And executing the following:

csv = Bumblebee::Template.new(columns: columns).generate(objects)

The above columns config would work both ways, so if we received the CSV, we could parse it to an array of nested hashes. Unfortunately, for now, we cannot do better than an array of nested hashes.

Custom Formatting

You can also pass in built-in or custom functions that can do the value formatting. For example:

columns = {
  'ID #': {
    property: :id,
    to_object: :integer
  },
  'First Name': {
    property: :first,
    through: :name,
    to_csv: ->(v) { v.to_s.upcase }
  },
  'Date of Birth': {
    property: :dob,
    through: :demo,
    to_object: { type: :date, nullable: true }
  },
  'Phone #': {
    property: :phone,
    through: :contact
  }
}

would ensure:

id is an integer data type when parsed
the CSV has only upper-case First Name values
dob is a date data type when parsed

Other formatting functions that can be used for to_object and/or to_csv:

bigdecimal: converts to BigDecimal (nullable, non-nullable default is 0)
boolean: converts to flexible boolean (nullable; non-nullable default is false). 1,t,true,y,yes all parse to true, 0,f,false,n,no all parse to false
date: converts to Date (nullable; non-nullable default is 1900-01-01)
integer: converts to Fixnum (nullable, non-nullable default is 0)
join: array is joined by separator option (defaults to comma)
float: converts to Float (nullable, non-nullable default is 0.0f)
function: custom lambda function (input is the resolved value, output of lambda will be used resolved value)
pluck_join: map the sub-property (sub_property option) then join them with separator (defaults to comma)
pluck_split: array is split by separator option (defaults to comma), then new object (object_class option) is created and sub-property (sub_property option) set.
split: array is split by separator option (defaults to comma)
string: calls to_s method on the value

Pluck Join / Pluck Split Explained

Pluck join and pluck split comes in handy when you have an array of objects and would like to:

map one value from each object and join it (in order to output in a CSV)
take a string value, split it, the map each value to a new object (in order to parse as objects)

Take this input and configuration for example:

objects = [
  {
    id: 1,
    name:     { first: 'Matt' },
    demo:     { dob: '1901-02-03' },
    contact:  { phone: '555-555-5555' },
    children: [ { id: 9, name: 'Spunky' }, { id: 10, name: 'Dunker' } ]
  },
  {
    id: 2,
    name:     { first: 'Nick' },
    demo:     { dob: '1921-09-03' },
    contact:  { phone: '444-444-4444' },
    children: [ { id: 11, name: 'Bonzi' }, { id: 12, name: 'Buddy' } ]
  },
  {
    id: 3,
    name:     { first: 'Sam' },
    demo:     { dob: '1932-12-12' },
    contact:  { phone: '333-333-3333' }
  }
]

columns = {
  'ID #': {
    property: :id,
    to_object: :integer
  },
  'Children ID #s': {
    property: :children,
    to_csv: { type: :pluck_join, separator: ';', sub_property: :id },
    to_object: { type: :pluck_split, separator: ';', sub_property: :id },
  }
}

Generating a CSV:

csv = Bumblebee::Template.new(columns: columns).generate(objects)

would output:

ID #	Children ID #s
1	9;10
2	11;12

Parsing a CSV:

objects = Bumblebee::Template.new(columns: columns).parse(csv)

would output:

objects = [
  {
    id: 1,
    children: [ { id: 9 }, { id: 10 } ]
  },
  {
    id: 2,
    children: [ { id: 11 }, { id: 12 } ]
  },
  {
    id: 3
  }
]

Parsing Into Custom Classes

Hash is the default return type when parsing a CSV. You can change this by providing a Hash-like class:

objects = Bumblebee::Template.new(columns: columns, object_class: OpenStruct).parse(csv)

Objects will now be an array of OpenStruct objects instead of Hash objects.

Note: you must also specify this in pluck_split:

columns = {
  'ID #': {
    property: :id,
    to_object: :integer
  },
  'Children ID #s': {
    property: :children,
    to_csv: { type: :pluck_join, separator: ';', sub_property: :id },
    to_object: { type: :pluck_split, separator: ';', sub_property: :id, object_class: OpenStruct },
  }
}

Further CSV Customization

The two main methods:

Template#generate
Template#parse

also accept custom options that Ruby's CSV::new accepts. The only caveat is that Bumblebee needs headers for its mapping, so it overrides the header options.

Template DSL

You can choose to pass in a block for template/column specification if you would rather prefer a code-first approach over a configuration-first approach.

Using Blocks

csv = Bumblebee::Template.new do |t|
  t.column 'ID #',        property: :id,
                          to_object: :integer

  t.column 'First Name',  property: :first,
                          through: :name
end.generate(objects)

objects = Bumblebee::Template.new do |t|
  t.column 'ID #',        property: :id,
                          to_object: :integer

  t.column 'First Name',  property: :first,
                          through: :name
end.parse(data)

Subclassing ::Bumblebee::Template

Another option is to subclass Template and declare your columns at the class-level:

class PersonTemplate < Bumblebee::Template
  column 'ID #',        property: :id,
                        to_object: :integer

  column 'First Name',  property: :first,
                        through: :name,
                        to_object: :pluck_split
end

template  = PersonTemplate.new
csv       = template.generate(objects)
objects   = template.parse(data)

Column Precedence

The preceding examples showed three ways to declare columns, and each is additive to the next (in the following order):

Class level (parent-first)
Argument level (passed into constructor)
Block level

To illustrate all three:

class PersonTemplate < Bumblebee::Template # first
  column 'ID #',        property: :id,
                        to_object: :integer

  column 'First Name',  property: :first,
                        through: :name,
                        to_object: :pluck_split
end

columns = {
  'Middle Name': {
    property: :middle
  }
}

template  = PersonTemplate.new(columns: columns) do |t| # second
  t.column 'Last Name', property: :last # third
end

When executed to generate a CSV, the columns would be (in order): ID #, First Name, Middle Name, Last Name.

Contributing

Development Environment Configuration

Basic steps to take to get this repository compiling:

Install Ruby (check bumblebee.gemspec for versions supported)
Install bundler (gem install bundler)
Clone the repository (git clone [email protected]:bluemarblepayroll/bumblebee.git)
Navigate to the root folder (cd bumblebee)
Install dependencies (bundle)

Running Tests

To execute the test suite run:

bundle exec rspec spec --format documentation

Alternatively, you can have Guard watch for changes:

bundle exec guard

Also, do not forget to run Rubocop:

bundle exec rubocop

Publishing

Note: ensure you have proper authorization before trying to publish new versions.

After code changes have successfully gone through the Pull Request review process then the following steps should be followed for publishing new versions:

Merge Pull Request into master
Update lib/bumblebee/version.rb using semantic versioning
Install dependencies: bundle
Update CHANGELOG.md with release notes
Commit & push master to remote and ensure CI builds master successfully
Build the project locally: gem build bumblebee
Publish package to RubyGems: gem push bumblebee-X.gem where X is the version to push
Tag master with new version: git tag <version>
Push tags remotely: git push origin --tags

License

This project is MIT Licensed.