TOON for Ruby
Token-Oriented Object Notation is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.
This is a Ruby port of the TOON library originally written in TypeScript.
Why TOON?
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money – and standard JSON is verbose and token-expensive:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
TOON conveys the same information with fewer tokens:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Installation
Add this line to your application's Gemfile:
gem 'toon-ruby'
And then execute:
bundle install
Or install it yourself as:
gem install toon-ruby
Quick Start
require 'toon'
data = {
'user' => {
'id' => 123,
'name' => 'Ada',
'tags' => ['reading', 'gaming'],
'active' => true,
'preferences' => []
}
}
puts Toon.encode(data)
Output:
user:
id: 123
name: Ada
[2]: reading,gaming
active: true
preferences[0]:
Key Features
- 💸 Token-efficient: typically 30–60% fewer tokens than JSON
- 🤿 LLM-friendly guardrails: explicit lengths and field lists help models validate output
- 🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
- 📐 Indentation-based structure: replaces braces with whitespace for better readability
- 🧺 Tabular arrays: declare keys once, then stream rows without repetition
API
Toon.encode(value, **options)
Converts any value to TOON format.
Parameters:
value– Any value to encode (Hash, Array, primitives, or nested structures)indent– Number of spaces per indentation level (default:2)delimiter– Delimiter for array values and tabular rows:',',"\t", or'|'(default:',')length_marker– Optional marker to prefix array lengths:'#'orfalse(default:false)
Returns:
A TOON-formatted string with no trailing newline or spaces.
Examples:
# Basic usage
Toon.encode({ 'id' => 1, 'name' => 'Ada' })
# => "id: 1\nname: Ada"
# Tabular arrays
items = [
{ 'sku' => 'A1', 'qty' => 2, 'price' => 9.99 },
{ 'sku' => 'B2', 'qty' => 1, 'price' => 14.5 }
]
Toon.encode({ 'items' => items })
# => "items[2]{sku,qty,price}:\n A1,2,9.99\n B2,1,14.5"
# Custom delimiter (tab)
Toon.encode({ 'items' => items }, delimiter: "\t")
# => "items[2\t]{sku\tqty\tprice}:\n A1\t2\t9.99\n B2\t1\t14.5"
# Length marker
Toon.encode({ 'tags' => ['a', 'b', 'c'] }, length_marker: '#')
# => "tags[#3]: a,b,c"
Format Overview
Objects
Simple objects with primitive values:
Toon.encode({
'id' => 123,
'name' => 'Ada',
'active' => true
})
id: 123
name: Ada
active: true
Nested objects:
Toon.encode({
'user' => {
'id' => 123,
'name' => 'Ada'
}
})
user:
id: 123
name: Ada
Arrays
Primitive Arrays (Inline)
Toon.encode({ 'tags' => ['admin', 'ops', 'dev'] })
[3]: admin,ops,dev
Arrays of Objects (Tabular)
When all objects share the same primitive fields, TOON uses an efficient tabular format:
Toon.encode({
'items' => [
{ 'sku' => 'A1', 'qty' => 2, 'price' => 9.99 },
{ 'sku' => 'B2', 'qty' => 1, 'price' => 14.5 }
]
})
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Mixed and Non-Uniform Arrays
Arrays that don't meet the tabular requirements use list format:
Toon.encode({
'items' => [1, { 'a' => 1 }, 'text']
})
items[3]:
- 1
- a: 1
- text
Delimiter Options
The delimiter option allows you to choose between comma (default), tab, or pipe delimiters:
# Tab delimiter (can save additional tokens)
data = {
'items' => [
{ 'sku' => 'A1', 'name' => 'Widget', 'qty' => 2 },
{ 'sku' => 'B2', 'name' => 'Gadget', 'qty' => 1 }
]
}
Toon.encode(data, delimiter: "\t")
Output:
items[2 ]{sku name qty}:
A1 Widget 2
B2 Gadget 1
Length Marker Option
The length_marker option adds a hash (#) prefix to array lengths:
data = {
'tags' => ['reading', 'gaming', 'coding'],
'items' => [
{ 'sku' => 'A1', 'qty' => 2 },
{ 'sku' => 'B2', 'qty' => 1 }
]
}
Toon.encode(data, length_marker: '#')
Output:
[#3]: reading,gaming,coding
items[#2]{sku,qty}:
A1,2
B2,1
Type Conversions
Some Ruby types are automatically normalized:
| Input | Output |
|---|---|
Symbol |
String (:hello → "hello") |
Time, DateTime |
ISO8601 string |
Date |
ISO8601 string |
Float::INFINITY, Float::NAN |
null |
Set |
Array |
Quoting Rules
TOON quotes strings only when necessary to maximize token efficiency:
- Empty strings are quoted:
"" - Strings with leading/trailing spaces:
" padded " - Strings that look like booleans/numbers:
"true","42" - Strings with structural characters:
"a,b","a:b","[5]" - The active delimiter triggers quoting
Keys follow similar rules and are quoted when needed.
Using TOON in LLM Prompts
When incorporating TOON into your LLM workflows:
- Wrap TOON data in a fenced code block in your prompt
- Tell the model: "Do not add extra punctuation or spaces; follow the exact TOON format."
- When asking the model to generate TOON, specify the same rules (2-space indentation, no trailing spaces, quoting rules)
Notes and Limitations
- Token counts vary by tokenizer and model. Benchmarks use a GPT-style tokenizer; actual savings will differ with other models.
- TOON is designed for LLM contexts where human readability and token efficiency matter. It's not a drop-in replacement for JSON in APIs or storage.
- Tabular arrays require all objects to have exactly the same keys with primitive values only.
- Object key order is preserved from the input. In tabular arrays, header order follows the first object's keys.
Development
After checking out the repo, run:
bundle install
Run the test suite:
bundle exec rspec
Or simply:
rake
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/andrepcg/toon-ruby.
License
The gem is available as open source under the terms of the MIT License.
Credits
This is a Ruby port of the original TOON library by Johann Schopplich.