Technical 8 min read

How Cognition saves you tokens (with the math)

Rediscovery is expensive and scales badly; recall is cheap and fixed. Here is exactly where the tokens go, why fast no-match is logged as a coverage signal rather than counted as savings, the back-of-envelope math, and how to read your own real numbers.

The single most expensive thing a coding agent does is rediscover something your team already knows. It greps the repo, reads five files that turn out to be irrelevant, tries an approach a teammate abandoned last month, hits the same wall, backs out, and tries again. Every one of those steps is tokens in and tokens out, and none of it produced anything you did not already have somewhere.

Cognition attacks this directly. This guide is the honest version: where the tokens actually go, the shape of the savings, and how to verify it on your own usage instead of trusting a marketing number.

Two cost curves, not one

The key idea is that rediscovery and recall have fundamentally different cost curves. Rediscovery is variable and unbounded: the more lost the agent gets, the more it costs, and a genuinely confusing problem can spiral. Recall is fixed and small: loading a skill is a roughly constant cost paid once, up front, before the spiral can start.

Path	Token cost	Scales with
Rediscover a known fix	high, variable	how lost the agent gets
Load the matching skill	low, fixed	nothing, paid once
Fast no-match	near zero	nothing

Note

These are directional, not a guarantee. Your real ratio depends on repo size, task complexity, and how often your work hits known territory. The point is the shape, not a specific multiplier: rediscovery scales badly and recall does not, so the savings grow with exactly the hard problems where you need them most.

A back-of-envelope example

Make it concrete. Say an agent hits a config gotcha your team has solved before. Cold, it might spend several thousand tokens exploring: reading config files, the build script, the error, a few wrong hypotheses, the eventual fix. Call it a few thousand tokens of round-trips, plus your wall-clock time watching it.

Warm, with the skill in the brain, the agent loads a few hundred tokens of structured skill (trigger, steps, checks, why) before it starts, recognizes the situation, and goes straight to the fix. The exploration never happens. The skill cost is paid once and amortizes across every future occurrence and every teammate who hits it.

The asymmetry is the whole game: you pay a small fixed cost to avoid a large variable one, and the trade gets better every time the situation recurs.

Why the no-match is the quiet hero

Most retrieval systems always return something, because returning the nearest chunk is free for them. But it is not free for you: every irrelevant chunk it injects is context your agent now has to read, weigh, and usually ignore, and sometimes it chases the noise instead of ignoring it. That is tokens spent to make the answer worse.

Cognition does the opposite. When nothing genuinely matches, it says so fast and injects nothing. A near-zero no-match is not a failure to be apologized for; it is the system declining to spend your tokens on a guess. Over a long session, the no-matches you never notice can save more than the hits you do.

Stop re-priming from scratch every session

There is a second, sneakier drain: re-establishing context the agent had yesterday. A fresh agent starts amnesiac, so a chunk of every long session goes to re-explaining the stack, conventions, and goals it already learned once. Saved project context and approved skills prime that in a single load instead of re-deriving it turn by turn. The persistent-context guide covers how to set that up; the token effect is that the expensive "getting oriented" phase mostly disappears.

Read your own numbers

Do not take any of this on faith, including from us. Cognition keeps the receipts and will show them:

"show me my stats"
  → tokens saved, hours saved, hit rate, each with its basis

Routes to cognition_status. Accept the follow-up to render an ascii dashboard for any host or a mermaid chart for markdown.

Then pair it with the activity ledger to see which skills actually earned the savings, so you know what to keep, refresh, or retire:

"what did Cognition use?"
  → every activation: trigger, sources, action, result, timestamp

Routes to the call ledger. It even logs the fast no-matches, the moments it deliberately spent nothing.

Tip

If you want to sanity-check the value honestly, watch the hit rate over a week of real work rather than a single session. Savings concentrate in repeated, known-territory tasks; a week smooths out the noise of any one day.

next steps

WorkflowsStop re-explaining yourself: persistent project context WorkflowsGet unstuck faster: decision trees, orchestration, and the stuck trigger

Get a scoped key