Intent-compressed
intelligence orchestration.

88% cheaper at turn 10. A maestro for any LLM. O(N²) → O(N) by math.

Pick any model as the Maestro — Claude, GPT, Gemini, OpenClaw, a local Ollama. It orchestrates gold/silver/bronze Workers from any vendor and shares a cached protocol prefix. 88% cheaper at turn 10, not by trick — by arithmetic on the published pricing pages.

Burnless turns long LLM conversations into compact capsule memory. Under the hood, it is evolving into a candidate protocol layer for human-LLM and LLM-LLM communication: a living, compressed, privacy-aware language between humans, maestros, and workers.

We compress pages of conversation history into a single 80-character line. Your AI keeps the memory. You stop paying for the excess.

Standalone O(N²) vs Burnless O(N) cost curve

Calibrated from real Anthropic API runs. Reproduce: python bench/v2.py --simulate

Calibrated against claude-opus-4-7 — verifiable
 Turns   Standalone   Burnless   Savings
   2        $0.80       $0.14    82.7%
   5        $2.06       $0.29    86.1%
  10        $4.34       $0.54    87.6%
  20        $9.59       $1.07    88.9%
  50       $30.72       $2.83    90.8%
A=O(N²) standalone. C=O(N) Burnless. Math, not heuristic.
Real case · Customer support agent · 50 messages exchanged
 Without Burnless   $2.45
 With Burnless      $0.28
 Real saving        88%
Same conversation. Same model. Different bill.
MIT licensed · Self-hosted · No backend required · pip install burnless

Beyond cost. The protocol layer.

A token is not an abstraction. It is compute. Compute is electricity. Electricity is water and infrastructure. O(N²) at the scale LLM inference is heading — 1–5% of global electricity within a decade — is not a pricing quirk. It is a trajectory.

Burnless is the transparent, provider-agnostic layer that sits between any user and any LLM, converting O(N²) context growth into O(N) by design. Not a feature. Not an SDK wrapper. A candidate protocol — like TCP/IP was to packets. The model receives a capsule. The user sends a message. The orchestration layer speaks compact state.

60–90 TWh/year

Estimated energy saved at 1% of global LLM inference. Denmark's entire electricity consumption is 35 TWh/year. This is not a rounding error.

Structurally unblockable

You cannot prohibit sending a summary of a conversation. That is indistinguishable from normal behavior. The protocol is invisible by design.

Inevitable. Open.

This layer will exist. The question is who defines it and in whose interest. Burnless answers: MIT, documented, first.

Read the founding vision →

Why the curve is quadratic.

Every turn in a standalone agent loop replays the full conversation as input. Cost on turn N is proportional to N, so total cost across N turns is Θ(N²). That is arithmetic from the pricing page, not a property of any SDK.

1. Capsules, not transcripts

Brain history holds ~80-char summaries of each turn, not the raw exchange. Full output stays on disk, read on demand.

2. Shared prefix cache

System prompt is byte-identical every turn with cache_control. Read price ($0.15/MTok) instead of write price ($15/MTok). 100× spread.

3. Tiers are roles, not models

Any model as Brain. Any model as Worker. GPT-4o, Opus, Sonnet, Codex, Ollama — one-line config change.

4. Living glossary

Core terms, project terms and session deltas form a compact language between human intent, Maestro and Workers.

5. Privacy by mode

Cost mode reduces repeated exposure today. Redact, audit, opaque and burnkey modes define the path to stronger local-key privacy.

We don't ask you to trust the table. Reproduce it: python bench/run.py --turns 8 with your own API key.

How it looks.

A chat-first CLI. Slash commands users already understand. Compact state under .burnless/. No hosted backend.

# in any project
$ pip install burnless
$ burnless setup
$ burnless
   # Burnless Chat — /commands, /maestro, /model, /workers, /native

$ burnless delegate "fix the failing tests"
   → d001 routed to silver/codex  (matched: test)
$ burnless delegate "summarize the logs"
   → d002 routed to bronze/haiku  (matched: summarize)
$ burnless run d002
   OK:d002 — cache_read=23,000 tokens (warm)

Gold/silver/bronze are quality/cost bands, not vendors. The default setup can mix Claude, Codex and Ollama; users can replace any tier in .burnless/config.yaml.

Cache compaction is realtime ROI math: Burnless freezes immutable blocks and only creates a new super-capsule when future cache-read savings beat the write+compaction cost.

Editions.

Run the protocol locally for free. Pay for governance, auditability, privacy operations and team infrastructure.

Burnless

Free · MIT

  • Open protocol, MIT licensed
  • Full CLI, runs locally
  • Brain + Worker + capsule history
  • Shared prefix cache, realtime capsule compaction
  • Local burnkey semantics planned in the protocol
  • Provider-agnostic — any LLM as Brain or Worker
  • Reproducible benchmark
View on GitHub

Burnless Cloud/Enterprise

Soon · waitlist

  • Shared cache across machines and teammates
  • KMS/HSM key custody and retention policy
  • Audit logs, legal hold and destruction reports
  • SSO + team permissions
  • Dashboards, SIEM/webhooks and priority support
Join Cloud waitlist

Don't trust the table. python bench/run.py --turns 8 with your own API key.

Burnless Cloud — waitlist.

Hosted features for teams: shared cache, dashboards, KMS/HSM key custody, audit logs, SSO and retention policy. The protocol is free and open source — Cloud/Enterprise is for organizations that need governance.