Mode
This package provides command line tools for managing datasets and connecting data to Mode including:
- Cloud data warehouse management
- Personal data warehouse connectivity
- Dataset formatting and importing (CSV)
Prerequisites
This package requires at least Ruby 1.9 and Ruby 2.0 is recommended.
If you don't have an up to date version of Ruby or you're not sure then use the instructions below to get going.
Mac OSX
To install Ruby on OSX complete the 4 steps below.
1. Install Homebrew
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
2. Install rbenv
brew update
brew install rbenv ruby-build
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile
3. Install Ruby 2.0
Note: This usually takes several minutes
rbenv install 2.0.0-p353
rbenv global 2.0.0-p353
rbenv rehash
4. Install the mode gem
gem install mode
rbenv rehash
OSX Combined
For convenience you can just copy and paste all the lines at once into your terminal
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
brew update
brew install rbenv ruby-build
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile
rbenv install 2.0.0-p353
rbenv global 2.0.0-p353
gem install mode
rbenv rehash
Windows
To install a current version of Ruby on windows complete the 2 steps below.
1. Install Ruby 2.0
You can download the latest ruby version from RubyInstaller.
2. Install the mode gem
$ gem install mode
Setup
Init
Initializes a new configuration file at the specified path which holds API credentials and other information
$ mode setup
Initializing configuration at /Users/josh/.mode.yml
Mode username: besquared
Your can view your access tokens at http://www.modeanalytics.com/accounts/besquared/access_tokens
Access token for besquared: ...
Wrote configuration to /Users/josh/.mode.yml
Working with CSV Data
Analyze
The analyze command performs two useful functions:
- Ensures that the csv is well formed with no syntax errors
- Tells you how Mode will recognize data types and format data
For better performance analyze only inspects a sampled subset of file.
You can optionally set the sampling rate by passing the --sample option with a number between 0 and 1.
$ mode analyze bikeshare_small.csv
# Analyzing bikeshare_small.csv (Sampling 12.70%)...
# Analyzed 12040 of 99999 rows
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| Field | Key | Type | String (%) | Integer (%) | Number (%) | Date/Time (%) | Boolean (%) | Empty (%) |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| start_time | No | datetime | | | | 100.00% | | |
| type | No | string | 100.00% | | | | | |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
Import
More to come!
Package
The package command will analyze your dataset and create a data package based on the field types that are recognized.
The data package can then be imported into the data warehouse.
</code>
Package Format
Mode packages are inspired by and compatible with Open Knowledge Foundation Standards, and specifically:
The most basic Mode data packages contains a normalized csv data file as well as a file describing the data.
- data.csv
- datapackage.json
Additionally they may contain a README.md, scripts and other resources.
datapackage.json
The datapackage.json file contains descriptive information of the data available in the package including naming, formatting and schema information.
{
"name" : "my-dataset",
"version": 0.0.1,
"title": "a human friendly title"
"resources": [
{
"name": "data",
"format": "csv",
"path": "data.csv",
"dialect": {
"delimiter": ",",
"quoteChar": "\"",
"doubleQuote": false,
"lineTerminator": "\r\n",
"skipInitialSpace": false
},
"schema": {
"fields": [
{
"name": "name of field (e.g. column name)",
"title": "A nicer human readable label or title for the field",
"type": "A string specifying the type",
"format": "A string specifying a format",
"description": "A description for the field"
...
},
],
"primaryKey": ['field1', 'field2', ...]
}
}
]
}