Loadmop
Helps load up the OMOP Vocabulary/CDM into the database of your choice.
Requirements
- Ruby 2.0+
- Bundler
- Some sort of RDBMS to store OMOP Vocabulary files in
Installation
Add this line to your application's Gemfile:
gem 'loadmop'
And then execute:
$ bundle
Or install it yourself as:
$ gem install loadmop
Usage
Run bundle exec loadmop
with no arguments to get a help screen and figure out which command you want to run
Preparation
Create a .env file in the root directory of the clone, specifying:
- SEQUELIZER_DATABASE
- The name of the database you want to install to
- Required
- SEQUELIZER_ADAPTER
- The name of the adapter to use (e.g. postgres, sqlite, mysql, oracle)
- Required
- SEQUELIZER_HOST
- The host on which the database lives
- Optional
- SEQUELIZER_USERNAME
- The username to connect to the database with
- Optional
- SEQUELIZER_PASSWORD
- The password to connect to the database with
- Optional
- SEQUELIZER_SEARCH_PATH
- At least for PostgreSQL, specifies a schema to install into
- Optional
See the Sequelizer Gem for some .env examples
Then:
- Create the database you just specified in your .env file
- loadmop isn't (yet) cool enough to actually create the database for you
- If you're using SQLite, you don't have to create the database file
- If your database defaults to using case-insensitive storage of text, (I'm looking at you MySQL and SQL Server), make sure to set a case-sensitive collation on your database
- cd into a directory where you've defined a config/database.yml or .env file that is compatible with Sequelizer
- run
bundle install
to make sure you have all the needed dependencies installed - run
bundle exec sequelizer config
to ensure your connection parameters are correctly set - run
bundle exec sequelizer update_gemfile
to ensure your Gemfile has the right database gem
Loading Vocabulary Files
- Download the OMOP Vocabulary Files and unzip them to some directory.
- I recommend you use the restricted files since they include CPT codes and other useful vocabularies.
- Run
bundle exec loadmop create_vocab_database /path/to/directory/holding/unzipped/vocabulary/files
- This runs all the steps for setting up the vocabulary database, namely
- Creating the proper tables
- Prepping the CSV files to load into the database
- Loading the CSV files into the database
- Adding some useful indexes to the vocabulary tables
- This runs all the steps for setting up the vocabulary database, namely
Loading CDM Data
- Run
bundle exec loadmop create_cdmv4_data /path/to/directory/holding/cdm/data/files
- This runs all the steps for loading CDM data into a database, namely
- Creating the proper CDM tables
- Prepping the CSV files to load into the database
- Loading the CSV files into the database
- Adding some useful indexes to the CDM tables
- This runs all the steps for loading CDM data into a database, namely
Pleas for Help
I've written methods to quickly load the data into PostgreSQL and SQLite, but I don't regularly use many other RDBMSs. Right now they use a method that should work for all RDBMSs, but is pretty slow.
Some of the faster methods are Unix-only as well. If there are fast, platform-independent ways to load the data, I'm interested.
Please submit suggestions or pull requests to speed up loading under other RDBMSs and I'll incorporate them. Thanks!
Contributing
- Fork it ( https://github.com/outcomesinsights/loadmop/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request