data_miner

Mine remote data into your ActiveRecord models.

Quick start

Put this in config/environment.rb:

config.gem 'seamusabshere-data_miner', :lib => 'data_miner', :source => 'http://gems.github.com'

You need to define mine_data blocks in your ActiveRecord models. For example, in app/models/country.rb:

class Country < ActiveRecord::Base
  mine_data do |step|
    # import country names and country codes
    step.import :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do |attr|
      attr.key :iso_3166, :name_in_source => 'country code'
      attr.store :iso_3166, :name_in_source => 'country code'
      attr.store :name, :name_in_source => 'country'
    end
  end
end

…and in app/models/airport.rb:

class Airport < ActiveRecord::Base
  belongs_to :country

  mine_data do |step|
    # import airport iata_code, name, etc.
    step.import(:url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false) do |attr|
      attr.key :iata_code, :field_number => 3
      attr.store :name, :field_number => 0
      attr.store :city, :field_number => 1
      attr.store :country, :field_number => 2, :foreign_key => :name       # will use Country.find_by_name(X)
      attr.store :iata_code, :field_number => 3
      attr.store :latitude, :field_number => 5
      attr.store :longitude, :field_number => 6
    end
  end
end

Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don’t know a way to automatically include gem tasks, so you have to do this manually for now)

namespace :data_miner do
  task :mine => :environment do
    DataMiner.mine :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
  end

  task :map_to_attrs => :environment do
    DataMiner.map_to_attrs ENV['METHOD'], :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
  end
end

You need to specify what order to mine data. For example, in config/initializers/data_miner_config.rb:

DataMiner.enqueue do |queue|
  queue << Country  # class whose data should be mined 1st
  queue << Airport  # class whose data should be mined 2nd
  # etc
end

Once you have (1) set up the order of data mining and (2) defined mine_data blocks in your classes, you can:

$ rake data_miner:mine

Complete example

~ $ rails testapp
~ $ cd testapp/
~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_id:integer latitude:float longitude:float
~/testapp $ ./script/generate model Country iso_3166:string name:string
~/testapp $ rake db:migrate
~/testapp $ touch lib/tasks/data_miner_tasks.rb
[...edit per quick start...]
~/testapp $ touch config/initializers/data_miner_config.rake
[...edit per quick start...]
~/testapp $ rake data_miner:mine

Now you should have

~/testapp $ ./script/console 
Loading development environment (Rails 2.3.3)
>> Airport.first.iata_code
=> "GKA"
>> Airport.first.country.name
=> "Papua New Guinea"

Authors

Copyright © 2009 Brighter Planet. See LICENSE for details.