theman

The man getting you down?

Theman lets you import lots of data into PostgreSQL very fast.

Installation

gem install 'theman'

Basic usage

Say we have a csv file called sample.csv and we want to count how many rows we created:

conn  = PGconn.open(:dbname => 'test')

agent = Theman::Agency.new(conn, 'sample.csv')
agent.create!

res = conn.exec("SELECT count(*) FROM #{agent.table_name}")
res.getvalue(0,0)

Basic usage with Active Record and a simple object

conn  = ActiveRecord::Base.connection.raw_connection

agent = Theman::Agency.new(conn, 'sample.csv')
agent.create!

model = Theman::Object.new(agent.table_name, ActiveRecord::Base)
model.count

Advanced usage with Active Record and an existing model

Theman will call the create! method if you pass in a block.

conn  = ActiveRecord::Base.connection.raw_connection

agent = Theman::Agency.new conn, 'ugly.csv' do |smith|
  smith.nulls /"N"/, /"UNKNOWN"/, /""/
  smith.chop 15
  smith.delimiter "|"
  smith.table do |t|
    t.string :name, :limit => 50
    t.date :date
    t.integer :ext_id
    t.float :amount
    t.boolean :exited
  end
end

MyModel.table_name = agent.table_name
MyModel.where(:exited => true).count

In the above example we omitted the last 15 rows, made some things null and specified some column data types.

If you do not provide a table block your columns will be VARCHAR(255); you can cherry pick the columns that you want to change the data types for.

The temp table has no id column by default, but you can add one by calling add_primary_key!, this will add the agents_pkey column.

Drop on commit

If you want to use ON COMMIT DROP you will need to pass in :on_commit => :drop into options and do everthing inside a transacton.

agent = Theman::Agency.new conn, 'sample.csv', :on_commit => :drop

agent.transaction do
  agent.create!
  # do stuff
end

No headers

If you data does not have headers pass into options :headers => false, but each column must be specified or the import will fail.

Dates

Ah dates, the joy! Use datestyle to tell Theman to then tell PostgreSQL:

agent = Theman::Agency.new conn, 'uber_foie_gras.csv' do |schmit|
  schmit.datestyle 'European'
  schmit.table do |t|
    t.date :start_date
    t.date :end_date
  end
end

Refer to PostgreSQL docs for more info in the mean time here is some copy and paste action:

ISO

Use ISO 8601-style dates and times (YYYY-MM-DD HH:MM:SS). This is the default.

SQL

Use Oracle/Ingres-style dates and times.

PostgreSQL

Use traditional PostgreSQL format.

German

dd.mm.yyyy

European

dd/mm/yyyy

US

mm/dd/yyyy

Errors

PostgreSQL COPY requires that the data be well formed, any rows that are different to what is expected by the table the import will raise a Theman::Agency::Error.

If you are importing very large files and the import fails space on disc will still be used untill VACUUM.

(The MIT License)

Copyright © 2010 mynameisrufus (Rufus Post)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.