Mode
This package provides command line tools for managing datasets and connecting data to Mode including:
- Cloud data warehouse management
- Personal data warehouse connectivity
- Dataset formatting and importing (CSV)
Install Mode
Mode requires Ruby 1.9 or newer
If you don't currently have Ruby 1.9 or aren't sure then follow the directions for installing it before continuing.
gem install mode
Install Ruby 1.9+ if it's not currently installed.
This package requires at least Ruby 1.9 and Ruby 2.0 is recommended.
If you don't have an up to date version of Ruby or you're not sure then use the instructions below to get going.
Mac OSX
To install Ruby on OSX complete the 4 steps below.
1. Install Homebrew
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
2. Install rbenv
brew update
brew install rbenv ruby-build rbenv-gem-rehash
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile
3. Install Ruby
Note: This usually takes several minutes
rbenv install 2.0.0-p353
rbenv global 2.0.0-p353
OSX Combined
For convenience you can just copy and paste all the lines at once into your terminal
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
brew update
brew install rbenv ruby-build rbenv-gem-rehash
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile
rbenv install 2.0.0-p353
rbenv global 2.0.0-p353
Windows
To install a current version of Ruby on windows complete the 2 steps below.
1. Install Ruby
You can download the latest ruby version from RubyInstaller.
Setup
Init
Initializes a new configuration file at the specified path which holds API credentials and other information
$ mode login
Initializing configuration at /Users/josh/.mode.yml
Mode username: besquared
Your can view your access tokens at http://www.modeanalytics.com/accounts/besquared/access_tokens
Access token for besquared: ...
Wrote configuration to /Users/josh/.mode.yml
Working with CSV Data
Analyze
The analyze command performs two useful functions:
- Ensures that the csv is well formed with no syntax errors
- Tells you how Mode will recognize data types and format data
For better performance analyze only inspects a sampled subset of file.
You can optionally set the sampling rate by passing the --sample option with a number between 0 and 1.
$ mode analyze bikeshare_small.csv
# Analyzing bikeshare_small.csv (Sampling 12.70%)...
# Analyzed 12040 of 99999 rows
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| Field | Key | Type | String (%) | Integer (%) | Number (%) | Date/Time (%) | Boolean (%) | Empty (%) |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| start_time | No | datetime | | | | 100.00% | | |
| type | No | string | 100.00% | | | | | |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
Import
More to come!
Package
The package command will analyze your dataset and create a data package based on the field types that are recognized.
The data package can then be imported into the data warehouse.
</code>
Package Format
Mode packages are inspired by and compatible with Open Knowledge Foundation Standards, and specifically:
The most basic Mode data packages contains a normalized csv data file as well as a file describing the data.
- data.csv
- datapackage.json
Additionally they may contain a README.md, scripts and other resources.
datapackage.json
The datapackage.json file contains descriptive information of the data available in the package including naming, formatting and schema information.
{
"name" : "my-dataset",
"version": 0.0.1,
"title": "a human friendly title"
"resources": [
{
"name": "data",
"format": "csv",
"path": "data.csv",
"dialect": {
"delimiter": ",",
"quoteChar": "\"",
"doubleQuote": false,
"lineTerminator": "\r\n",
"skipInitialSpace": false
},
"schema": {
"fields": [
{
"name": "name of field (e.g. column name)",
"title": "A nicer human readable label or title for the field",
"type": "A string specifying the type",
"format": "A string specifying a format",
"description": "A description for the field"
...
},
],
"primaryKey": ['field1', 'field2', ...]
}
}
]
}