Mode

This package provides command line tools for managing datasets and connecting data to Mode including:

  • Cloud data warehouse management
  • Personal data warehouse connectivity
  • Dataset formatting and importing (CSV)

Install Mode

Mode requires Ruby 1.9 or newer

If you don't currently have Ruby 1.9 or aren't sure then follow the directions for installing it before continuing.

gem install mode

Install Ruby 1.9+ if it's not currently installed.

This package requires at least Ruby 1.9 and Ruby 2.0 is recommended.

If you don't have an up to date version of Ruby or you're not sure then use the instructions below to get going.

Mac OSX

To install Ruby on OSX complete the 4 steps below.

1. Install Homebrew

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

2. Install rbenv

brew update
brew install rbenv ruby-build rbenv-gem-rehash
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile

3. Install Ruby

Note: This usually takes several minutes

rbenv install 2.0.0-p353
rbenv global 2.0.0-p353

OSX Combined

For convenience you can just copy and paste all the lines at once into your terminal

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
brew update
brew install rbenv ruby-build rbenv-gem-rehash
echo 'eval "$(rbenv init -)"' >> ~/.bash_profile
source ~/.bash_profile
rbenv install 2.0.0-p353
rbenv global 2.0.0-p353

Windows

To install a current version of Ruby on windows complete the 2 steps below.

1. Install Ruby

You can download the latest ruby version from RubyInstaller.

Setup

Init

Initializes a new configuration file at the specified path which holds API credentials and other information

$ mode login

Initializing configuration at /Users/josh/.mode.yml
Mode username: besquared
Your can view your access tokens at http://www.modeanalytics.com/accounts/besquared/access_tokens
Access token for besquared: ...
Wrote configuration to /Users/josh/.mode.yml

Working with CSV Data

Analyze

The analyze command performs two useful functions:

  • Ensures that the csv is well formed with no syntax errors
  • Tells you how Mode will recognize data types and format data

For better performance analyze only inspects a sampled subset of file.

You can optionally set the sampling rate by passing the --sample option with a number between 0 and 1.

$ mode analyze bikeshare_small.csv

#  Analyzing bikeshare_small.csv (Sampling 12.70%)...
#  Analyzed 12040 of 99999 rows
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| Field      | Key | Type     | String (%) | Integer (%) | Number (%) | Date/Time (%) | Boolean (%) | Empty (%) |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+
| start_time | No  | datetime |            |             |            |       100.00% |             |           |
| type       | No  | string   |    100.00% |             |            |               |             |           |
+------------+-----+----------+------------+-------------+------------+---------------+-------------+-----------+

Import

More to come!

Package

The package command will analyze your dataset and create a data package based on the field types that are recognized.

The data package can then be imported into the data warehouse.

</code>

Package Format

Mode packages are inspired by and compatible with Open Knowledge Foundation Standards, and specifically:

The most basic Mode data packages contains a normalized csv data file as well as a file describing the data.

  • data.csv
  • datapackage.json

Additionally they may contain a README.md, scripts and other resources.

datapackage.json

The datapackage.json file contains descriptive information of the data available in the package including naming, formatting and schema information.

{
  "name" : "my-dataset",
  "version": 0.0.1,
  "title": "a human friendly title"

  "resources": [
    {
      "name": "data",
      "format": "csv",
      "path": "data.csv",

      "dialect": {
        "delimiter": ",",
        "quoteChar": "\"",
        "doubleQuote": false,
        "lineTerminator": "\r\n",
        "skipInitialSpace": false
      },

      "schema": {
        "fields": [
          {
            "name": "name of field (e.g. column name)",
            "title": "A nicer human readable label or title for the field",
            "type": "A string specifying the type",
            "format": "A string specifying a format",
            "description": "A description for the field"
            ...
          },
        ],

        "primaryKey": ['field1', 'field2', ...]
      }
    }
  ]
}