TOON vs JSON: The New Format Designed for AI

AI Summary7 min read

TL;DR

TOON is a new data format that reduces LLM token costs by 30-60% compared to JSON. It uses tabular arrays and minimal quoting for efficiency, while improving AI accuracy.

Key Takeaways

  • TOON saves 30-60% on tokens by declaring schemas once and using CSV-style values for uniform data.
  • It enhances LLM comprehension with explicit array lengths and smart quoting, leading to better accuracy.
  • TOON is ideal for sending large, structured datasets to LLMs but not for general APIs or non-uniform data.

Tags

aillmjsonprogramming

How a novel data format is saving developers 30-60% on LLM token costs

If you've been working with Large Language Models, you've probably noticed something: feeding data to AI isn't free. Every JSON object you pass through an API costs tokens, and those tokens add up fast. Enter TOON (Token-Oriented Object Notation), a new serialization format designed specifically to solve this problem.

The Token Tax Problem

Let's start with a real example. Imagine you're building an app that sends employee data to an LLM for analysis:

{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin", "salary": 75000 },
    { "id": 2, "name": "Bob", "role": "user", "salary": 65000 },
    { "id": 3, "name": "Charlie", "role": "user", "salary": 70000 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This JSON snippet consumes 257 tokens. Now look at the same data in TOON:

users[3]{id,name,role,salary}:
1,Alice,admin,75000
2,Bob,user,65000
3,Charlie,user,70000
Enter fullscreen mode Exit fullscreen mode

Just 166 tokens — a 35% reduction. For this small example, the savings might seem trivial. But scale that to hundreds of API calls with thousands of records, and suddenly you're looking at real cost reductions.

What Makes TOON Different?

TOON borrows the best ideas from existing formats and optimizes them for LLM consumption:

1. Tabular Arrays: Declare Once, Use Many

The core insight behind TOON is simple: when you have uniform arrays of objects (same fields, same types), why repeat the keys for every single object?

JSON's Approach (repetitive):

[
  { "sku": "A1", "qty": 2, "price": 9.99 },
  { "sku": "B2", "qty": 1, "price": 14.50 }
]
Enter fullscreen mode Exit fullscreen mode

TOON's Approach (efficient):

[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
Enter fullscreen mode Exit fullscreen mode

The schema is declared once in the header {sku,qty,price}, then each row is just CSV-style values. This is where TOON shines brightest.

2. Smart Quoting

TOON only quotes strings when absolutely necessary:

  • hello world → No quotes needed (inner spaces are fine)
  • hello 👋 world → No quotes (Unicode is safe)
  • "hello, world" → Quotes required (contains comma delimiter)
  • " padded " → Quotes required (leading/trailing spaces)

This minimal-quoting approach saves tokens while keeping the data unambiguous.

3. Indentation Over Brackets

Like YAML, TOON uses indentation instead of curly braces for nested structures:

JSON:

{
  "user": {
    "id": 123,
    "profile": {
      "name": "Ada"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

TOON:

user:
  id: 123
  profile:
    name: Ada
Enter fullscreen mode Exit fullscreen mode

Cleaner, more readable, and fewer tokens.

4. Explicit Array Lengths

TOON includes the array length in brackets ([N]), which actually helps LLMs understand and validate the structure:

tags[3]: admin,ops,dev
Enter fullscreen mode Exit fullscreen mode

This explicit metadata reduces parsing errors when LLMs are generating or interpreting structured data.

Real-World Benchmarks

The TOON project ran comprehensive benchmarks across different data types and LLM models. Here's what they found:

Token Savings by Dataset

Dataset JSON Tokens TOON Tokens Savings
GitHub Repos (100 records) 15,145 8,745 42.3%
Analytics (180 days) 10,977 4,507 58.9%
E-commerce Orders 257 166 35.4%

The sweet spot? Uniform tabular data — records with consistent schemas across many rows. The more repetitive your JSON keys, the more TOON can optimize.

LLM Comprehension

But token efficiency doesn't matter if the LLM can't understand the format. The benchmarks tested 4 different models (GPT-5 Nano, Claude Haiku, Gemini Flash, Grok) on 154 data retrieval questions:

  • TOON accuracy: 70.1%
  • JSON accuracy: 65.4%
  • Token reduction: 46.3%

TOON not only saves tokens but actually improves LLM accuracy. The explicit structure (array lengths, field declarations) helps models parse and validate data more reliably.

When Should You Use TOON?

TOON isn't meant to replace JSON everywhere. Think of it as a specialized tool for a specific job.

Use TOON When:

  • Sending large datasets to LLMs (hundreds or thousands of records)
  • Working with uniform data structures (database query results, CSV exports, analytics)
  • Token costs are a significant concern
  • You're making frequent LLM API calls with structured data

Stick With JSON When:

  • Building traditional REST APIs
  • Storing data in databases
  • Working with deeply nested or non-uniform data
  • You need universal compatibility with existing tools

As the TOON documentation puts it: "Use JSON programmatically, convert to TOON for LLM input."

How to Get Started

TOON is available as an npm package with a simple API:

import { encode, decode } from '@toon-format/toon'

const data = {
  items: [
    { sku: 'A1', qty: 2, price: 9.99 },
    { sku: 'B2', qty: 1, price: 14.5 }
  ]
}

// Convert to TOON
const toon = encode(data)
console.log(toon)
// items[2]{sku,qty,price}:
// A1,2,9.99
// B2,1,14.5

// Convert back to JSON
const restored = decode(toon)
Enter fullscreen mode Exit fullscreen mode

There's also a CLI tool for quick conversions:

# Encode JSON to TOON
npx @toon-format/cli data.json -o data.toon

# Decode TOON to JSON
npx @toon-format/cli data.toon -o data.json

# Show token savings
npx @toon-format/cli data.json --stats
Enter fullscreen mode Exit fullscreen mode

Alternative Delimiters

For even more token efficiency, you can use tab or pipe delimiters instead of commas:

// Tab-separated (often more token-efficient)
encode(data, { delimiter: '\t' })

// Pipe-separated
encode(data, { delimiter: '|' })
Enter fullscreen mode Exit fullscreen mode

The Growing Ecosystem

While TOON is relatively new, the community is already building implementations across multiple languages:

  • Official: JavaScript/TypeScript, Python (in dev), Rust (in dev)
  • Community: PHP, Ruby, Go, Swift, Elixir, C++, Java, and more

The project maintains a comprehensive specification and conformance test suite to ensure compatibility across implementations.

The Bottom Line

TOON represents a shift in thinking about data formats. For decades, we've optimized for human readability and machine interoperability. Now, with LLMs consuming massive amounts of structured data, we need formats optimized for token efficiency and AI comprehension.

Is TOON going to replace JSON? No. But for the specific use case of feeding structured data to LLMs, it offers compelling advantages:

  • 30-60% token savings on uniform tabular data
  • Better LLM accuracy thanks to explicit structure
  • Drop-in conversion from existing JSON workflows
  • Growing ecosystem with multi-language support

If you're building AI-powered applications that consume significant amounts of structured data, TOON is worth exploring. Your token budget will thank you.


Resources:

Have you tried TOON in your projects? What kind of token savings are you seeing? Share your experience in the comments.

Visit Website