data_transit data_transit is a ruby gem/app used to migrate between different relational databases, supporting customized migration procedure.

#1 Introduction data_transit relies on activerecord to generate database Models on the fly. Tt is executed within a database transaction, and should any error occur during data transit, it will cause the transaction to rollback. So don’t worry about introducing dirty data into your target database.

#2 Install data_transit can be installed using gem

gem install data_transit

or

download data_transit.gem
gem install --local /where/you/put/data_transit.gem

In your command line, input data_transit, You can proceed to “how to use” if you can see message prompt as below. data_transit usage: data_transit command args. 4 commands are listed below…

#3 How to use

#3.1 Config DataBase Connection

data_transit setup_db_conn /your/yaml/database/config/file

Your db config file should be compatible with the activerecord adapter you are using and configured properly. As the name suggests, source designates which database you plan to copy data from, while target means the copy destination. Note the key ‘source’ and ‘target’ should be kept unchanged! For example, here is sample file for oracle dbms.

database.yml source:#don’t change this line

adapter: oracle_enhanced
database: dbserver
username: xxx
password: secret

target:#don’t change this line

adapter: oracle_enhanced
database: orcl
username: copy1
password: cipher

#3.2 Create Database Schema(optional) If you can have your target database schema created, move to 3.3 “copy data”, otherwise use your database specific tools to generate an identical schema in your target database. Or if you don’t have a handy tool to generate the schema, for instance when you need to migrate between different database systems, you can use data_transit to dump a schema description file based on your source database schema, and then use this file to create your target schema.

Note this gem is built on activerecord, so it should work well for the database schema compatible with rails conventions. Example, a single-column primary key

rather than a compound primary key (primary key with more than one columns), primary key of integer type instead of other types like guid, timestamp etc.

In data_transit, I coded against the situation where non-integer is used, therefore resulting a minor problem that the batch-query feature provided by activerecord can not be used because of its dependency on integer primary key. In this special case, careless selection of copy range might overburden network, database server because all data in the specified range are transmitted from the source database and then inserted into the target database.

#3.2.1 Dump Source Database Schema

data_transit dump_schema [schema_file] [rule_file]
schema_file

will be used to contain dumped schema, [rule_file] describes how

your want data_transit to copy your data.

Note if your source schema is somewhat a legacy, you might need some manual work to adjust the generated schema_file to better meet your needs.

For example, in my test, the source schema uses obj_id as id, and uses guid as primary key type, so I need to impede activerecord auto-generating “id” column by removing primary_key => “obj_id” and adding :id=>false, and then appending primary key constraint in the end of each table definition. See below.

here is an example dumped schema file

ActiveRecord::Schema.define(:version => 0) do

create_table "table_one", primary_key => "obj_id", :force => true do |t|
  #other fields
end
#other tables

end

and I manually changed the schema definition to

ActiveRecord::Schema.define(:version => 0) do

create_table "table_one", :id => false, :force => true do |t|
  t.string "obj_id",           :limit => 42  
  #other fields
end
execute "alter table table_one add primary key(obj_id)"

end

#3.2.2 Create Target Database Schema

data_transit create_table [schema_file]

If everything goes well, you will see a bunch of ddl execution history.

#3.3 Copy Data

data_transit copy_data [rule_file]
rule_file

contains your copy logic. For security reasons, I changed table names

and it looks as follows.

start_date = “2015-01-01 00:00:00” end_date = “2015-02-01 00:00:00”

migrate do

choose_table "APP.TABLE1","APP.TABLE2","APP.TABLE3","APP.TABLE4","APP.TABLE5","APP.TABLE6"
batch_by "ACQUISITION_TIME BETWEEN TO_DATE('#{start_date}','yyyy-mm-dd hh24:mi:ss') AND TO_DATE('#{end_date}', 'yyyy-mm-dd hh24:mi:ss')"
register_primary_key 'OBJ_ID'

end

migrate do

choose_table "APP.TABLE7","APP.TABLE8","APP.TABLE9","APP.TABLE10","APP.TABLE11","APP.TABLE12"
batch_by "ACKTIME BETWEEN TO_DATE('#{start_date}','yyyy-mm-dd hh24:mi:ss') AND TO_DATE('#{end_date}', 'yyyy-mm-dd hh24:mi:ss')"
register_primary_key 'OBJ_ID'

end

migrate do

choose_table "APP.TABLE13","APP.TABLE14","APP.TABLE15","APP.TABLE16","APP.TABLE17","APP.TABLE18"
batch_by "1>0" #query all data because these tables don't have a reasonable range
register_primary_key 'OBJ_ID'
pre_work do |targetCls| targetCls.delete_all("1>0")  end #delete all in target
#post_work do |targetCls|  end

end

Each migrate block contains a data_transit task.

“choose_table” describes which tables are included in this task. These tables share some nature in common, and can be processed with the same rule.

“batch_by” is the query condition

“register_primary_key” describes the primary key of the tables.

“pre_work” is a block executed before each table is processed.

“post_work” is a block executed after each table is processed.