Kaggle

A Ruby client for the Kaggle API with support for datasets, competitions, models, and more. See: https://www.kaggle.com/docs/api

This is an unofficial project and still a work in progress (WIP) ... more to come soon.

Features

  • 📊 Download Kaggle datasets programmatically
  • 📄 Parse CSV datasets to JSON format
  • 💾 Configurable caching to avoid re-downloading
  • 🔧 Flexible download and cache paths
  • ⚡ Built-in error handling and validation
  • 🛠️ Command-line interface for quick operations

Installation

Add this line to your application's Gemfile:

gem 'kaggle'

And then execute:

$ bundle

Or install it yourself as:

$ gem install kaggle

Setup

You'll need Kaggle API credentials to use this gem. There are three ways to authenticate:

  1. Go to your Kaggle account page
  2. Click "Create New API Token" to download kaggle.json
  3. Place the file in your project directory or specify the path

Option 2: Environment Variables

export KAGGLE_USERNAME="yourusername"
export KAGGLE_KEY="your_api_key"

Option 3: Direct Credentials

Pass credentials directly when initializing the client.

Kaggle JSON File Format

The kaggle.json file downloaded from Kaggle should have this format:

{
  "username": "yourusername",
  "key": "your_api_key"
}

Usage

Basic Usage

require 'kaggle'

# Option 1: Use kaggle.json file (automatically detected)
client = Kaggle::Client.new

# Option 1b: Use custom JSON file path
client = Kaggle::Client.new(credentials_file: '/path/to/kaggle.json')

# Option 2: Use environment variables
client = Kaggle::Client.new

# Option 3: Use explicit credentials
client = Kaggle::Client.new(
  username: 'your_username',
  api_key: 'your_api_key'
)

Download Datasets

# Basic download
file_path = client.download_dataset('zillow', 'zecon')

# Download and parse CSV to JSON
data = client.download_dataset('zillow', 'zecon', parse_csv: true)

# Use caching to avoid re-downloading
data = client.download_dataset('zillow', 'zecon', 
                              parse_csv: true, 
                              use_cache: true)

Custom Paths

client = Kaggle::Client.new(
  credentials_file: '/path/to/kaggle.json',
  download_path: '/custom/downloads',
  cache_path: '/custom/cache'
)

Dataset Information

# Get dataset files list
files = client.dataset_files('zillow', 'zecon')

# Parse existing CSV file
data = client.parse_csv_to_json('/path/to/file.csv')

Command Line Interface

The gem includes a command-line interface:

# Download dataset
kaggle download zillow zecon

# Download and parse CSV
kaggle download zillow zecon --parse-csv

# Use custom credentials file
kaggle download zillow zecon --credentials-file /path/to/kaggle.json

# Use custom paths
kaggle download zillow zecon --download-path /custom --cache-path /custom/cache

# Show dataset files
kaggle files zillow zecon

# Show version
kaggle --version

Configuration Options

Option Default Description
credentials_file ./kaggle.json Path to Kaggle credentials JSON file
download_path ./downloads Where to save downloaded files
cache_path ./cache Where to cache parsed data
timeout 30 HTTP request timeout in seconds
use_cache false Use cached parsed data when available
parse_csv false Automatically parse CSV files to JSON

Error Handling

The gem includes specific error types:

begin
  client.download_dataset('invalid', 'dataset')
rescue Kaggle::AuthenticationError
  puts "Invalid credentials"
rescue Kaggle::DatasetNotFoundError
  puts "Dataset not found"
rescue Kaggle::DownloadError
  puts "Download failed"
rescue Kaggle::ParseError
  puts "Failed to parse data"
end

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Tests

To run tests execute:

$ rake test

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yourusername/kaggle. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Kaggle project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.