How Cognition saves you tokens (with the math)
Rediscovery is expensive and scales badly; recall is cheap and fixed. Here is exactly where the tokens go, why fast no-match is logged as a coverage signal rather than counted as savings, the back-of-envelope math, and how to read your own real numbers.
The single most expensive thing a coding agent does is rediscover something your team already knows. It greps the repo, reads five files that turn out to be irrelevant, tries an approach a teammate abandoned last month, hits the same wall, backs out, and tries again. Every one of those steps is tokens in and tokens out, and none of it produced anything you did not already have somewhere.
Cognition attacks this directly. This guide is the honest version: where the tokens actually go, the shape of the savings, and how to verify it on your own usage instead of trusting a marketing number.
Two cost curves, not one
The key idea is that rediscovery and recall have fundamentally different cost curves. Rediscovery is variable and unbounded: the more lost the agent gets, the more it costs, and a genuinely confusing problem can spiral. Recall is fixed and small: loading a skill is a roughly constant cost paid once, up front, before the spiral can start.
| Path | Token cost | Scales with |
|---|---|---|
| Rediscover a known fix | high, variable | how lost the agent gets |
| Load the matching skill | low, fixed | nothing, paid once |
| Fast no-match | near zero | nothing |
These are directional, not a guarantee. Your real ratio depends on repo size, task complexity, and how often your work hits known territory. The point is the shape, not a specific multiplier: rediscovery scales badly and recall does not, so the savings grow with exactly the hard problems where you need them most.
A back-of-envelope example
Make it concrete. Say an agent hits a config gotcha your team has solved before. Cold, it might spend several thousand tokens exploring: reading config files, the build script, the error, a few wrong hypotheses, the eventual fix. Call it a few thousand tokens of round-trips, plus your wall-clock time watching it.
Warm, with the skill in the brain, the agent loads a few hundred tokens of structured skill (trigger, steps, checks, why) before it starts, recognizes the situation, and goes straight to the fix. The exploration never happens. The skill cost is paid once and amortizes across every future occurrence and every teammate who hits it.
The asymmetry is the whole game: you pay a small fixed cost to avoid a large variable one, and the trade gets better every time the situation recurs.
Why the no-match is the quiet hero
Most retrieval systems always return something, because returning the nearest chunk is free for them. But it is not free for you: every irrelevant chunk it injects is context your agent now has to read, weigh, and usually ignore, and sometimes it chases the noise instead of ignoring it. That is tokens spent to make the answer worse.
Cognition does the opposite. When nothing genuinely matches, it says so fast and injects nothing. A near-zero no-match is not a failure to be apologized for; it is the system declining to spend your tokens on a guess. Over a long session, the no-matches you never notice can save more than the hits you do.
Stop re-priming from scratch every session
There is a second, sneakier drain: re-establishing context the agent had yesterday. A fresh agent starts amnesiac, so a chunk of every long session goes to re-explaining the stack, conventions, and goals it already learned once. Saved project context and approved skills prime that in a single load instead of re-deriving it turn by turn. The persistent-context guide covers how to set that up; the token effect is that the expensive "getting oriented" phase mostly disappears.
Read your own numbers
Do not take any of this on faith, including from us. Cognition keeps the receipts and will show them:
"show me my stats" → tokens saved, hours saved, hit rate, each with its basis
Then pair it with the activity ledger to see which skills actually earned the savings, so you know what to keep, refresh, or retire:
"what did Cognition use?" → every activation: trigger, sources, action, result, timestamp
If you want to sanity-check the value honestly, watch the hit rate over a week of real work rather than a single session. Savings concentrate in repeated, known-territory tasks; a week smooths out the noise of any one day.
next steps
