nanoGPT
A Ruby port of Karpathy's nanoGPT. Train GPT-2 style language models from scratch using torch.rb.
Built for Ruby developers who want to understand how LLMs work by building one.
Quick Start
gem install nanogpt
# Prepare Shakespeare dataset with character-level tokenizer
nanogpt prepare shakespeare_char
# Train (use MPS on Apple Silicon for 17x speedup)
nanogpt train --dataset=shakespeare_char --device=mps --max_iters=2000
# Generate text
nanogpt sample --dataset=shakespeare_char
Or from source:
git clone https://github.com/khasinski/nanogpt-rb
cd nanogpt-rb
bundle install
# Prepare data
bundle exec ruby data/shakespeare_char/prepare.rb
# Train
bundle exec exe/nanogpt train --dataset=shakespeare_char --device=mps --max_iters=2000
# Sample
bundle exec exe/nanogpt sample --dataset=shakespeare_char
Performance (M1 Max)
Training the default 10.65M parameter model on Shakespeare:
| Device | Time/iter | Notes |
|---|---|---|
| MPS | ~500ms | Recommended for Apple Silicon |
| CPU | ~8,500ms | 17x slower |
After ~2000 iterations (~20 min on MPS), the model generates coherent Shakespeare-like text.
Commands
nanogpt train [options] # Train a model
nanogpt sample [options] # Generate text from trained model
nanogpt bench [options] # Run performance benchmarks
Training Options
--dataset=NAME # Dataset to use (default: shakespeare_char)
--device=DEVICE # cpu or mps, cuda might work too 🤞(default: auto)
--max_iters=N # Training iterations (default: 5000)
--batch_size=N # Batch size (default: 64)
--block_size=N # Context length (default: 256)
--n_layer=N # Transformer layers (default: 6)
--n_head=N # Attention heads (default: 6)
--n_embd=N # Embedding dimension (default: 384)
--learning_rate=F # Learning rate (default: 1e-3)
--config=FILE # Load settings from JSON file
Sampling Options
--dataset=NAME # Dataset (for tokenizer)
--out_dir=DIR # Checkpoint directory
--num_samples=N # Number of samples to generate
--max_new_tokens=N # Tokens per sample (default: 500)
--temperature=F # Sampling temperature (default: 0.8)
--top_k=N # Top-k sampling (default: 200)
Training on Your Own Text
You can train on any text file using the textfile command:
# Prepare your text file (creates char-level tokenizer)
nanogpt prepare textfile /path/to/mybook.txt --output=mybook
# Train a model
nanogpt train --dataset=mybook --device=mps --max_iters=2000
# Generate text
nanogpt sample --dataset=mybook --start="Once upon a time"
Options
--output=NAME # Output directory name (default: derived from filename)
--val_ratio=F # Validation split ratio (default: 0.1)
Example: Training on a Novel
# Download a book
curl -o lotr.txt "https://example.com/fellowship.txt"
# Prepare (handles UTF-8 and Windows-1252 encodings)
nanogpt prepare textfile lotr.txt --output=lotr
# Train a larger model for better results
nanogpt train --dataset=lotr --device=mps \
--max_iters=2000 \
--n_layer=6 --n_head=6 --n_embd=384 \
--block_size=256 --batch_size=32
# Sample with a prompt
nanogpt sample --dataset=lotr --start="Frodo" --max_new_tokens=500
The textfile command:
- Streams through large files without loading everything into memory
- Auto-detects encoding (UTF-8 or Windows-1252)
- Creates a character-level vocabulary from your text
- Splits into train/validation sets
Features
- Full GPT-2 architecture (attention, MLP, layer norm, embeddings)
- MPS (Metal) and CUDA GPU acceleration via torch.rb
- Flash attention when dropout=0 (5x faster attention)
- Cosine learning rate schedule with warmup
- Gradient accumulation for larger effective batch sizes
- Checkpointing and resumption
- Character-level and GPT-2 BPE tokenizers
Requirements
- Ruby >= 3.1
- LibTorch (installed automatically with torch-rb)
- For MPS: macOS 12.3+ with Apple Silicon
License
MIT