nanoGPT

Gem Version

A Ruby port of Karpathy's nanoGPT. Train GPT-2 style language models from scratch using torch.rb.

Built for Ruby developers who want to understand how LLMs work by building one.

Quick Start

gem install nanogpt

# Prepare Shakespeare dataset with character-level tokenizer
nanogpt prepare shakespeare_char

# Train (use MPS on Apple Silicon for 17x speedup)
nanogpt train --dataset=shakespeare_char --device=mps --max_iters=2000

# Generate text
nanogpt sample --dataset=shakespeare_char

Or from source:

git clone https://github.com/khasinski/nanogpt-rb
cd nanogpt-rb
bundle install

# Prepare data
bundle exec ruby data/shakespeare_char/prepare.rb

# Train
bundle exec exe/nanogpt train --dataset=shakespeare_char --device=mps --max_iters=2000

# Sample
bundle exec exe/nanogpt sample --dataset=shakespeare_char

Performance (M1 Max)

Training the default 10.65M parameter model on Shakespeare:

Device Time/iter Notes
MPS ~500ms Recommended for Apple Silicon
CPU ~8,500ms 17x slower

After ~2000 iterations (~20 min on MPS), the model generates coherent Shakespeare-like text.

Commands

nanogpt train [options]    # Train a model
nanogpt sample [options]   # Generate text from trained model
nanogpt bench [options]    # Run performance benchmarks

Training Options

--dataset=NAME        # Dataset to use (default: shakespeare_char)
--device=DEVICE       # cpu or mps, cuda might work too 🤞(default: auto)
--max_iters=N         # Training iterations (default: 5000)
--batch_size=N        # Batch size (default: 64)
--block_size=N        # Context length (default: 256)
--n_layer=N           # Transformer layers (default: 6)
--n_head=N            # Attention heads (default: 6)
--n_embd=N            # Embedding dimension (default: 384)
--learning_rate=F     # Learning rate (default: 1e-3)
--config=FILE         # Load settings from JSON file

Sampling Options

--dataset=NAME        # Dataset (for tokenizer)
--out_dir=DIR         # Checkpoint directory
--num_samples=N       # Number of samples to generate
--max_new_tokens=N    # Tokens per sample (default: 500)
--temperature=F       # Sampling temperature (default: 0.8)
--top_k=N            # Top-k sampling (default: 200)

Training on Your Own Text

You can train on any text file using the textfile command:

# Prepare your text file (creates char-level tokenizer)
nanogpt prepare textfile /path/to/mybook.txt --output=mybook

# Train a model
nanogpt train --dataset=mybook --device=mps --max_iters=2000

# Generate text
nanogpt sample --dataset=mybook --start="Once upon a time"

Options

--output=NAME       # Output directory name (default: derived from filename)
--val_ratio=F       # Validation split ratio (default: 0.1)

Example: Training on a Novel

# Download a book
curl -o lotr.txt "https://example.com/fellowship.txt"

# Prepare (handles UTF-8 and Windows-1252 encodings)
nanogpt prepare textfile lotr.txt --output=lotr

# Train a larger model for better results
nanogpt train --dataset=lotr --device=mps \
  --max_iters=2000 \
  --n_layer=6 --n_head=6 --n_embd=384 \
  --block_size=256 --batch_size=32

# Sample with a prompt
nanogpt sample --dataset=lotr --start="Frodo" --max_new_tokens=500

The textfile command:

  • Streams through large files without loading everything into memory
  • Auto-detects encoding (UTF-8 or Windows-1252)
  • Creates a character-level vocabulary from your text
  • Splits into train/validation sets

Features

  • Full GPT-2 architecture (attention, MLP, layer norm, embeddings)
  • MPS (Metal) and CUDA GPU acceleration via torch.rb
  • Flash attention when dropout=0 (5x faster attention)
  • Cosine learning rate schedule with warmup
  • Gradient accumulation for larger effective batch sizes
  • Checkpointing and resumption
  • Character-level and GPT-2 BPE tokenizers

Requirements

  • Ruby >= 3.1
  • LibTorch (installed automatically with torch-rb)
  • For MPS: macOS 12.3+ with Apple Silicon

License

MIT