I Audited My Claude Code Token Burn. The Waste Wasn't Where I Expected.

I had a feeling my Claude Code usage had gotten weird.

Not “I asked too many questions” weird. More like: I had terminal sessions open all day, cron jobs running, a persistent remote-control service so I could poke Claude Code from my phone, and occasional orchestrator prompts that spun up agents, those agents spun up worktrees, and the worktrees sometimes ran their own PRD implementation loops.

That is not a chat history anymore. That is an operating system made of transcripts.

So I finally audited it.

30-day tokens

API list cost

× Max subscription

largest subagent fan-out

Thirty days. 3.28 billion tokens across Claude Code. ~$2,400 at Anthropic’s published API list pricing — about 12x the cost of a $200 Max subscription. The number to anchor on isn’t the cache breakdown — those are local-log estimates with caveats. The number to anchor on is the order of magnitude and the shape: where did all this go, and what would I change next month?

The headline surprised me: the extra tools, skills, and MCP definitions were not the main bill. They were visible. They were annoying. They were worth cleaning up. But they were not where the real burn came from.

The real burn came from a simple pattern:

I was not paying for one smart agent. I was paying for many agents to carry many large cached prefixes through many turns.

— Brett Ridenour

What I installed

I started with two open-source local tools:

ccusage, which reads local Claude Code usage from disk and reports daily, session, and billing-block-style usage.
cc-lens, which gives a local browser dashboard over ~/.claude.

For Opus 4.7, the public Claude API pricing page lists $15/M input and $75/M output, with cache pricing also published per tier. ccusage reads my local JSONL transcripts and produces session-level dollar estimates from them. Those estimates are useful for shape — which orchestrator was big, which workers ran long, which sessions dominated — but I’m not going to publish bill-grade caching-specific breakdowns from them. If you want exact numbers, your real Anthropic invoice is the source of truth. The local logs are how I find out which session to argue about.

Then I added my own tiny command:

claude-session-audit <session-id-or-path>

That command walks a single Claude Code session, includes its nested subagents/ folder, sums token usage directly from the JSONL, and writes a markdown report with:

parent-vs-subagent usage
cache-read and cache-create totals
largest cache events
repeated billing shapes
tool-call counts
initial prompt samples

The important thing is that it audits a workload, not just a day.

ccusage daily answers: “How much did I use on Tuesday?”

claude-session-audit answers: “What did that giant orchestrator actually do?”

Where Claude Code stores the evidence

Claude Code writes local transcripts under:

~/.claude/projects/

The folder names are path-slugged. My Freebo repo, for example, shows up as:

~/.claude/projects/-home-brettr-Documents-FreeboDecember/

Main sessions are JSONL files:

3f564fba-c351-43ad-b9bd-a4ca6af99b26.jsonl

Subagents usually live beside that file in a folder with the same session id:

3f564fba-c351-43ad-b9bd-a4ca6af99b26/subagents/agent-a545f899c456836fc.jsonl

In my current local tree, the main Claude Code profile had:

1,740 JSONL files
641 subagent JSONLs
many project roots, including home, FreeboDecember, Brett Omarchy, Reelforge, Astro blog, and old worktree paths

That explained why aggregate reports were useful but not sufficient. A “session” can hide a whole agent network.

The FreeboDecember orchestrator

The first run I wanted to audit was a huge FreeboDecember PRD push.

This was the kind of prompt that makes sense at midnight and looks insane in the morning: a parent orchestrator dispatching many feature implementers, each in isolated git worktrees, each responsible for turning a PRD into a branch and PR.

The session id was:

3f564fba-c351-43ad-b9bd-a4ca6af99b26

The audit found 28 JSONL files: the parent transcript plus 27 subagent transcripts.

The parent was the biggest single line item.

Unit	Total tokens	Cache read	Cache create	Output	Est. cost
Parent orchestrator	115.9M	114.2M	1.4M	265.7K	$73
Top worker: PRD-J	23.4M	23.0M	295.9K	55.7K	$15
Top worker: PRD-I	15.8M	15.5M	298.0K	33.5K	$10
Top worker: PRD-M	15.2M	14.9M	249.9K	33.3K	$10

The cache-read column is the story. Almost all of the tokens were cached prefix reads.

That is cheaper than raw input. It is still usage.

top FreeboDecember workers 5 agents

01 general-purpose PRD-J new vertical terminology + asset profiles · 23.4M tokens

02 general-purpose PRD-I snapshot writer / tax fee lock · 15.8M tokens

03 general-purpose PRD-M back-office booking · 15.2M tokens

04 general-purpose PRD-T revenue reports · 12.2M tokens

05 general-purpose PRD-S tax config · 11.4M tokens

The 100-subagent run

Then I found the run I had in the back of my mind: the one with about 100 subagents.

It was under:

~/.claude/projects/-home-brettr/433b6ea4-c48e-4f27-9c0b-20acfacf74cd/

That session had 101 JSONL files: the parent plus exactly 100 subagents.

This one had the opposite shape.

FreeboDecember was mostly cache reads. The 100-subagent run had a huge cache-creation bill:

38.8M cache-create tokens
47.2M cache-read tokens
1.87M output tokens
about $313 ccusage-style estimate

The parent alone accounted for about $263.

That is the moment the model of “number of subagents equals cost” gets too simple. The subagents matter, but the parent orchestration loop can dominate if it keeps creating and mutating a large prefix.

The thing I thought was expensive

I expected the bloated tool list to be the smoking gun.

Each FreeboDecember worker was starting with a huge capability surface:

about 315 tool names per subagent
144 skills listed
browser tools
Slack tools
Notion tools
Wix tools
Vercel tools
n8n tools
Todoist tools
NotebookLM tools
Railway tools
Supabase tools
the actual tools needed for code

That looks bad because it is bad.

But the direct cost was smaller than I expected.

I measured the first assistant turn for each FreeboDecember subagent. That is the closest observable proxy for “booting the agent with tools, skills, MCPs, and initial context.”

Across all 27 subagents, first-turn startup was about:

711,618 cache-create tokens
157,824 cache-read tokens
roughly $4.50 ccusage-style estimate

So I changed my mental model.

The bad part of a giant tool surface is not only “the prompt is bigger.” It is that it makes the agent’s world bigger.

A worker implementing a database migration does not need to know about Canva, Slack, Spotify, Wix, Todoist, Airbnb, Vercel toolbar threads, or browser screenshot controls. Even if those definitions cache well, they are still part of the attention surface. They increase the chance the agent explores, retries, or routes work through a tool that should not exist in that context.

The thing that actually got expensive

The FreeboDecember run was expensive because it combined four multipliers:

Large parent context

PRDs, repo rules, tools, skills, branch policy, release context

Many workers

27 subagents, each in a worktree, each doing real implementation

Many turns per worker

Read, edit, test, fix, status, retry

Long cached prefixes

Mostly cheap reads, but repeated hundreds of times

step 1

Large parent context

PRDs, repo rules, tools, skills, branch policy, release context

step 2

Many workers

27 subagents, each in a worktree, each doing real implementation

step 3

Many turns per worker

Read, edit, test, fix, status, retry

step 4

Long cached prefixes

Mostly cheap reads, but repeated hundreds of times

The bill was not one catastrophic prompt. It was multiplication.

The top worker alone had 193 assistant turns.

Another had 156.

Another had 148.

That is where “cached” stops feeling free. If a worker reads a 100K-token prefix 150 times, the fact that each read is cheap is not enough to make the run small.

Prompt caching made the run possible. It did not make the run disciplined.

— Brett Ridenour

The idle session question

I also wanted to understand what happens when I leave Claude Code open.

The short version:

An idle Claude Code session should not burn tokens by itself.

Tokens are consumed when something actually calls the model:

user prompts
agents
cron jobs
remote-control actions
loops
scheduled runs
monitoring prompts
tool-driven retries

Leaving the terminal open is not the same thing as running the model.

But cache is time-sensitive. If I leave a session idle for a day, the next message is probably not benefiting from the short-lived prompt cache. It may recreate a large prefix: tools, instructions, project context, prior transcript, loaded docs.

That is why a day-old session can feel “free” while sitting there and then expensive on the next real turn.

What I would change in the orchestrator

If I rewrote my feature-orchestrator skill based on this audit, I would not start by deleting tools. I would start by changing the shape of the work.

Plan

3 agents

Cheap, narrow, no code

prd-review plan-feature dependency map

Execute

4 agents

Small worker wave

db api ui tests

Consolidate

3 agents

Read PRs, not transcripts

review conflicts next wave

1. Planning becomes mandatory

Every PRD should go through a cheap planning gate before implementation.

The output should be a compact execution brief:

files likely touched
DB/API/UI scope
tests required
dependency notes
conflict risks
“do not read” areas
what would make the worker stop

Only the brief goes to implementation workers. Not the whole sprint context.

2. Fan-out gets capped

No more giant wave unless the work is truly independent.

For serious code:

3 to 4 workers per wave
each wave must finish or block
consolidator reads PR summaries and diffs
next wave launches only after contracts are stable

For content or research:

large fan-out is fine only if each worker gets a tiny prompt
parent should not keep regenerating a giant state object
workers should return strict JSON or compact bullets

3. Workers get roles, not the universe

The worker prompt should say what kind of worker it is.

better worker roles 4 agents

01 Plan DB/migration worker · schema, RLS, migrations, rollback

02 Plan API worker · routes, services, contracts, tests

03 Plan Frontend worker · components, state, UI specs

04 Plan Verification worker · typecheck, lint, Playwright, docs

Each role gets only the tools and instructions it needs.

The DB worker does not need browser tools.

The docs worker does not need Supabase admin tools.

The API worker does not need Todoist.

The content worker does not need Railway.

4. Model routing becomes explicit

I had too much Opus doing mechanical work.

The split I want:

Opus for orchestration, architectural review, final judgment
Sonnet for normal implementation
Haiku for summarization, docs, status compression, mechanical extraction

That alone would probably matter more than shaving a few thousand tokens off the startup context.

5. Stop conditions should detect repeated assumptions

The earlier failure mode in this sprint was not “an agent wrote bad code.” It was multiple PRDs making incompatible assumptions about a feature flag and a shared contract.

The orchestrator should stop if:

two workers fail on the same upstream assumption
two workers touch the same migration or enum
a feature flag is required by downstream PRDs but not enabled
a shared type changes after workers have already started
CI failures repeat across a wave

That is not a technical failure. It is a planning failure. The orchestrator needs to recognize it.

6. Status output gets compressed

Long status reports are comforting and expensive.

Workers should return a compact schema:

{
  "status": "done | blocked | failed",
  "branch": "feat/example",
  "pr": "url-or-null",
  "files_changed": 12,
  "tests": ["typecheck", "lint"],
  "blockers": [],
  "contracts_changed": ["locations.vertical"]
}

The parent can store that. It does not need prose unless something is actually blocked.

7. Usage telemetry becomes part of the loop

After each wave, the orchestrator should run a usage audit.

Not after the sprint. Not after the bill feels weird. After the wave.

The report should ask:

Which agent spent the most?
Was the parent more expensive than the workers?
Did cache creation spike?
Did repeated billing signatures show polling or retries?
Did the tool list include irrelevant MCPs?
Should the next wave continue?

That last question matters. Sometimes the correct next action is not “spawn more agents.” It is “summarize, compact, and restart from a narrower plan.”

The revised orchestrator contract

If I rewrote the skill tomorrow, the heart of it would look like this:

You are a wave orchestrator, not a giant autonomous engineer.

Phase 1: turn PRDs into compact execution briefs. No code.
Phase 2: select at most 4 independent briefs.
Phase 3: spawn narrow workers with role-specific tools.
Phase 4: require compact structured status.
Phase 5: consolidate PRs, contracts, and failures.
Phase 6: run token audit before the next wave.

Stop if repeated failures point to one shared assumption.
Stop if a shared contract changes mid-wave.
Stop if the parent session becomes the largest cost center.
Stop if the next wave would launch with unresolved schema or feature-flag state.

That is less glamorous than “run 27 engineers all night.”

It is also closer to how good engineering work actually scales.

What I learned

The important lesson is not “agents are expensive.”

The important lesson is that agent architecture has a cost model.

The startup context was visible and easy to blame. It was not the main cost.

The real problem was that I let an orchestrator create a lot of long-running workers, each with enough context and freedom to behave like a full engineer, and then I let them loop until the PRDs were either implemented, blocked, or exhausted.

That can be useful. Sometimes it is exactly what I want.

But if I am going to do it regularly, I need to treat it like infrastructure:

measure every big run
keep workers narrow
cap fan-out
route models by job type
compress status
stop on repeated assumptions
audit after each wave

Claude Code did not become expensive because it was sitting open in a terminal.

It became expensive when I gave it a whole engineering organization and forgot to give that organization a budget.