Conformist

Build Status Code Climate

Bend CSVs to your will with declarative schemas. Map one or many columns, preprocess cells and lazily enumerate. Declarative schemas are easier to understand, quicker to setup and independent of I/O. Use CSV (Formally FasterCSV), Spreadsheet or any array of array-like data structure.

Quick and Dirty Examples

Open a CSV file and declare a schema. A schema compromises of columns. A column takes an arbitrary name followed by its position in the input. A column may be derived from multiple positions.

``` ruby require ‘conformist’ require ‘csv’

csv = CSV.open ‘~/transmitters.csv’ schema = Conformist.new do column :callsign, 1 column :latitude, 1, 2, 3 column :longitude, 3, 4, 5 column :name, 0 do |value| value.upcase end end ```

Insert the transmitters into a SQLite database.

``` ruby require ‘sqlite3’

db = SQLite3::Database.new ‘transmitters.db’ schema.conform(csv).each do |transmitter| db.execute “INSERT INTO transmitters (callsign, …) VALUES (‘#transmittertransmitter.callsign’, …);” end ```

Only insert the transmitters with the name “Mount Cooth-tha” using ActiveRecord or DataMapper.

ruby transmitters = schema.conform(csv).select do |transmitter| transmitter.name == 'Mount Coot-tha' end transmitters.each do |transmitter| Transmitter.create! transmitter.attributes end

Source from multiple, different input files and insert transmitters together into a single database.

``` ruby require ‘conformist’ require ‘csv’ require ‘sqlite3’

au_schema = Conformist.new do column :callsign, 8 column :latitude, 10 end us_schema = Conformist.new do column :callsign, 1 column :latitude, 1, 2, 3 end

au_csv = CSV.open ‘~/au/transmitters.csv’ us_csv = CSV.open ‘~/us/transmitters.csv’

db = SQLite3::Database.new ‘transmitters.db’

[au_schema.conform(au_csv), us_schema.conform(us_csv)].each do |schema| schema.each do |transmitter| db.execute “INSERT INTO transmitters (callsign, …) VALUES (‘#transmittertransmitter.callsign’, …);” end end ```

Open a Microsoft Excel spreadsheet and declare a schema.

``` ruby require ‘conformist’ require ‘spreadsheet’

book = Spreadsheet.open ‘~/states.xls’ sheet = book.worksheet 0 schema = Conformist.new do column :state, 0, 1 do |values| “#valuesvalues.first, #valuesvalues.last” end column :capital, 2 end ```

Print each state’s attributes to standard out.

ruby schema.conform(sheet).each do |state| $stdout.puts state.attributes end

For more examples see test/fixtures, test/schemas and test/unit/integration_test.rb.

Installation

Conformist is available as a gem. Install it at the command line.

sh $ [sudo] gem install conformist

Or add it to your Gemfile and run $ bundle install.

ruby gem 'conformist'

Usage

Anonymous Schema

Anonymous schemas are quick to declare and don’t have the overhead of creating an explicit class.

``` ruby citizen = Conformist.new do column :name, 0, 1 column :email, 2 end

citizen.conform [[‘Tate’, ‘Johnson’, ‘tate@tatey.com’]] ```

Class Schema

Class schemas are explicit. Class schemas were the only type available in earlier versions of Conformist.

``` ruby class Citizen extend Conformist

column :name, 0, 1 column :email, 2 end

Citizen.conform [[‘Tate’, ‘Johnson’, ‘tate@tatey.com’]] ```

Implicit Indexing

Column indexes are implicitly incremented when the index argument is omitted. Implicit indexing is all or nothing.

ruby column :account_number # => 0 column :date { |v| Time.new *v.split('/').reverse } # => 1 column :description # => 2 column :debit # => 3 column :credit # => 4

Conform

Conform is the principle method for lazily applying a schema to the given input.

ruby enumerator = schema.conform CSV.open('~/file.csv') enumerator.each do |row| puts row.attributes end

Input

#conform expects any object that responds to #each to return an array-like object.

ruby CSV.open('~/file.csv').responds_to? :each # => true [[], [], []].responds_to? :each # => true

Header Row

#conform takes an option to skip the first row of input. Given a typical CSV document, the first row is the header row and irrelevant for enumeration.

ruby schema.conform CSV.open('~/file_with_headers.csv'), :skip_first => true

Named Columns

Strings can be used as column indexes instead of integers. These strings will be matched against the first row to determine the appropriate numerical index.

``` ruby citizen = Conformist.new do column :email, ‘EM’ column :name, ‘FN’, ‘LN’ end

citizen.conform [[‘FN’, ‘LN’, ‘EM’], [‘Tate’, ‘Johnson’, ‘tate@tatey.com’]], :skip_first => true ```

Enumerator

#conform is lazy, returning an Enumerator. Input is not parsed until you call #each, #map or any method defined in Enumerable. That means schemas can be assigned now and evaluated later. #each has the lowest memory footprint because it does not build a collection.

Struct

The argument passed into the block is a struct-like object. You can access columns as methods or keys. Columns were only accessible as keys in earlier versions of Conformist. Methods are now the preferred syntax.

ruby citizen[:name] # => "Tate Johnson" citizen.name # => "Tate Johnson"

For convenience the #attributes method returns a hash of key-value pairs suitable for creating ActiveRecord or DataMapper records.

ruby citizen.attributes # => {:name => "Tate Johnson", :email => "tate@tatey.com"}

One Column

Maps the first column in the input file to :first_name. Column indexing starts at zero.

ruby column :first_name, 0

Many Columns

Maps the first and second columns in the input file to :name.

ruby column :name, 0, 1

Indexing is completely arbitrary and you can map any combination.

ruby column :name_and_city 0, 1, 2

Many columns are implicitly concatenated. Behaviour can be changed by passing a block. See preprocessing.

Preprocessing

Sometimes values need to be manipulated before they’re conformed. Passing a block gets access to values. The return value of the block becomes the conformed output.

ruby column :name, 0, 1 do |values| values.map(&:upcase) * ' ' end

Works with one column too. Instead of getting a collection of objects, one object is passed to the block.

ruby column :first_name, 0 do |value| value.upcase end

It’s also possible to provide a context object that is made available during preprocessing.

``` ruby citizen = Conformist.new do column :name, 0, 1 do |values, context| (context[:upcase?] ? values.map(&:upcase) : values) * ‘ ‘ end end

citizen.conform [[‘tate’, ‘johnson’]], context: true ```

Virtual Columns

Virtual columns are not sourced from input. Omit the index to create a virtual column. Like real columns, virtual columns are included in the conformed output.

ruby column :day do 1 end

Inheritance

Inheriting from a schema gives access to all of the parent schema’s columns.

Anonymous Schema

Anonymous inheritance takes inspiration from Ruby’s syntax for instantiating new classes.

``` ruby parent = Conformist.new do column :name, 0, 1 end

child = Conformist.new parent do column :category do ‘Child’ end end ```

Class Schema

Classical inheritance works as expected.

``` ruby class Parent extend Conformist

column :name, 0, 1 end

class Child < Parent column :category do ‘Child’ end end ```

Upgrading from <= 0.0.3 to >= 0.1.0

Where previously you had

``` ruby class Citizen include Conformist::Base

column :name, 0, 1 end

Citizen.load(‘~/file.csv’).foreach do |citizen| # … end ```

You should now do

``` ruby require ‘fastercsv’

class Citizen extend Conformist

column :name, 0, 1 end

Citizen.conform(FasterCSV.open(‘~/file.csv’)).each do |citizen| # … end ```

See CHANGELOG.md for a full list of changes.

Compatibility

  • MRI 2.4.0, 2.3.1, 2.2.0, 2.1.0, 2.0.0, 1.9.3
  • JRuby

Dependencies

No explicit dependencies, although CSV and Spreadsheet are commonly used.

Contributing

  1. Fork
  2. Install dependancies by running $ bundle install
  3. Write tests and code
  4. Make sure the tests pass locally by running $ bundle exec rake
  5. Push to GitHub and make sure continuous integration tests pass at https://travis-ci.org/tatey/conformist/pull_requests
  6. Send a pull request on GitHub

Please do not increment the version number in lib/conformist/version.rb. The version number will be incremented by the maintainer after the patch is accepted.

Motivation

Motivation for this project came from the desire to simplify importing data from various government organisations into Antenna Mate. The data from each government was similar, but had completely different formatting. Some pieces of data needed preprocessing while others simply needed to be concatenated together. Not wanting to write a parser for each new government organisation, I created Conformist.

Copyright © 2016 Tate Johnson. Conformist is released under the MIT license. See LICENSE for details.