Rseed

Rseed is a featureful library and bunch of utilities to assist in importing mass data (or even a few lines of data) into your Rails project.

Rseed is a replacement for active_import (github.com/intrica/active_import). There are lots of improvements in order to make it easy to create and maintain converters.

Rseed can also import from excel files with the optional rseed-roo gem (github.com/intrica/rseed-roo).

Requirements

>= Rails 3.0 >= Ruby 1.9

Installation

Simple add the following to your Gemfile

gem 'rseed'

Then run:

bundle install

Quick Example

rails g rseed:converter HtmlColor

This will create an model converter in the directory app/rseed. You can read through this import file to see how the import works.

This also creates a default data file of html_colors.csv in db/rseed. This will be the CSV used for this converter.

Generators

The generator will automatically create attribute lines for all of the attributes in the model except :id, :created_at, :updated_at.

Generator options are as follows:

  • –attribute

This will set up the converter to do a first_or_initilize on the specified attribute instead of a new

  • –minimal

This will cause the generator to create a file with fewer comments and without reduntanct definitions in the columns.

The Converter File

<em>For some reason that we haven’t been able to pin down, Rails struggles with resolving constants within these converters. This does not affect the converters when running in production modes. The workaround for this is to prefix any of your

constant names (such as model classes) with \

within your converters. This is done by default when using the generator</em>

Attribute Options

  • :header

Defines the name of the attribute to be used for serialization. If there is no :match defined, it will also be used to match the attribute name of the input to the attribute being defined.

  • :match

A regex string that is used to match the attribute name of the input to the attribute being defined. If this is not defined, a match will be checked against :header and then the attribute name.

  • :type

Defines a type for the string.

  • :model

This can be set to the name of a model that this attribute should resolve to. The model is classified so using a symbol works here. Alternately, if only the :model_attribute is set, the name of the attribute will be used as the model name.

  • :model_attribute

Specify which attribute on the model is used for lookup.

  • :model_match

Specifies how the model should be resolved. The value here is called against the where that is used to look up the model. For example, this defaults to :first. If your model is Person and the :model_attribute is :name then this is what is called to set the attribute value:

Person.where(name: <value>).first

You may use any active record method in this case, such as :first_or_create, or :last.

  • :optional

Defines the attribute as optionsal. This has no effect in the HashAdapter.

before_deserialize

If you define a function called before_serialize you can do any preprocessing you require. One example of this is marking an archive flag on existing data:

def before_deserialize
    HtmlColor.where(import_archive: true).update_all({:import_archive => false})
    true
end

Note that you must return true from this function. Returning false will cause the processor to give up and log an error. Thus you can also use the following:

def before_deserialize
    return fail_with_error "Mandatory option is missing" unless options["mandatory_option"]
    true
end

after_deserialize

You can define this function to be called at the end of processing. Following from the example above, if you set import_archive to be false for each model in the deserialize method, you could do the following to remove old

records:

   def after_deserialize
       HtmlColor.where(import_archive: true).destroy_all
   end

This example is obviously fairly destructive and there are better ways to deal with this situation than destroying the records.

Rake Tasks

These rake tasks allow you to run seeds manually:

rake rseed:csv                   Load csv file into a model using a model converter

Examples

rake rseed:csv file=users.csv converter=User converter_options="give_admin_access=true" adapter_options="col_sep=\t"

In this case the file in db/rseed/users.csv would be run through the converter UserConverter. The options specified are available within the converter. In this case options will evaluate to “true”.

Processor Options

  • :within_transaction

Setting this to true will wrap the entire deserialize in an transaction, avoiding calling a commit after each line. For large data sets, this should speed up insertion.

Seeding

If you want to seed your Rails application using Rseed. The best method is to add lines like the following to db/seeds.rb

Rseed::from_csv 'html_colors.csv', converter: :html_color, converter_options: { no_red: true }

You can always use if statements in this file to filter seeds by Rails.env or something similar.

Custom Type Conversions

TODO