Use Case

100 Agents, 88 Million Tokens, $313

I asked myself what I'd do with one hour of unlimited token spend. The answer: let 100 agents argue with my unshipped LinkedIn drafts. ccusage came back with a $313.28 receipt across 87.9 million tokens. Three posts got killed. Seven got tighter.

Brett Ridenour Brett Ridenour · May 19, 2026

0

Subagents

0

Tokens spent

$0

Dollars (ccusage)

0

Posts killed by judges

The setup

I had ten LinkedIn posts already drafted, sitting in the vault, that I hadn’t shipped. They weren’t blocked on writing. They were blocked on whether they were good.

The wedge

I ran /last30days to see what people were publishing about agent-team patterns in early May 2026. One repo kept showing up: a 10x-content-expert skill that does parallel critic-and-judge orchestration per piece of content. Per post: P3 · Sonnet Sonnet tier + P5 · Opus Opus tier critic panel (four roles × two tiers), plus a dual-judge pass on both Opus and Sonnet.

Ten agents per post. Ten posts. One hundred subagent transcripts in a single session directory.

agent network · 1 post · 8 critics · 2 judges

The role shape

Each critic plays a fixed role and writes feedback in a fixed shape. The judges read all critics for a single post and produce a single ranked rewrite. No free-form arguing. Just role-played reviewers with bounded outputs.

# Critic role: DevilsAdvocate (the one that earned its tokens)

You are a hostile critic reviewing one LinkedIn post draft.
You are not the author. You do not care about being nice.

Your job is to find the line that sounds clever but says nothing.
Find the unsupported claim. Find the lazy framing.

Return your output in this fixed shape:
1. Strongest single objection (one sentence).
2. The line in the post that triggered it (verbatim).
3. What the post would need to defend the claim
   (one sentence, no hedging).
4. Verdict: SHIP / TIGHTEN / KILL.

Do not suggest rewrites. That's the judge's job.
Do not soften your verdict to be diplomatic.

---

# Judge role (runs after all critics return)

You are the judge for post #{post_id}. Critics have returned
their verdicts. Read them all. Then produce:

1. A single ranked rewrite that addresses the strongest objection.
   Keep the author's voice.
2. A verdict: SHIP (with rewrite) / KILL (don't ship).
3. One sentence explaining the verdict.

You may not include rewrites that contradict more than one critic.

The run

I pointed the skill at ten draft posts in ~/Documents/Brett Omarchy/Blog/. Watched the subagent count climb in the side panel — 30, 60, 90 — then settle at 100.

ls subagents/*.meta.json | jq -r '.description' | head 11 agents
01 general-purpose P3 ICP post 01
02 general-purpose P3 BrandVoice post 01
03 general-purpose P3 DevilsAdvocate post 01
04 general-purpose P3 PR-risk post 01
05 general-purpose P5 ICP post 01
06 general-purpose P5 BrandVoice post 01
07 general-purpose P5 DevilsAdvocate post 01
08 general-purpose P5 PR-risk post 01
09 general-purpose Judge post 01
10 general-purpose Judge post 01 sonnet
11 general-purpose … × 10 posts = 100 agents total

The result

Three of the ten posts came back saying “don’t ship this — it’s a take, not a story.” I killed those drafts. The other seven got tightened. None of them shipped looking identical to where they started.

Across the ten posts, the most useful pass was almost always DevilsAdvocate. The brand-voice agent over-indexed on copying surface tics (“Brett uses dashes a lot”). The PR-risk agent flagged things that weren’t risks. The ICP agent reliably surfaced one good edit per post. But the devil’s advocate did the work I would have done with a coffee and an editor — it found the unsupported claim, the lazy framing, the line that sounded clever but said nothing.

Useful edits surfaced across the 10-post run Brett's count
DevilsAdvocate
18
ICP
9
PR-risk
3
BrandVoice
2

Independence beats intelligence.

— Brett Ridenour, on running 100 agents at one draft

Why it worked

Not because a hundred agents are smarter than one. They aren’t. They’re more independent than one. The same Claude doing the same critique twice will pattern-match its own first answer. Multiple role-prompted Claudes across two model tiers surface legitimately different signals. The judges’ job is comparison, not generation. That’s where the brute-force economics earn out: critics are cheap Sonnet in parallel, judges are dear Opus sequential, and the asymmetry maps to where you need quality.

When it earns its tokens

This isn’t cheap. The session above ran $313.28 end-to-end per ccusage (Opus 4.7, 5-minute cache write rate). And that’s with aggressive prompt caching — without those cache hits the same run would have been several thousand dollars.

The breakdown is wild. The parent orchestrator session alone was $262.52. All 100 subagents combined were ~$50. The Supreme Judge agents were the most expensive per-call ($1.00–$1.55 each) because they read all the critic outputs every time. The independent critics were the cheapest ($0.59–$0.93). Per post, you’re looking at roughly $30 amortized.

That price is a feature. It forces you to only run this on things worth being right about.

The most valuable output of the run was the three posts the judges said don’t ship. That’s a signal a one-author workflow can’t give you. I would have shipped those. I’m glad I didn’t.

References & further reading