Last week I had what felt like a great idea: a persistent Claude Code session, model set to Opus 4.7, working directory pinned to my Obsidian vault, started at boot, kept alive forever, named “PA.” The whole vault loaded into context. CLAUDE.md as standing orders. Every time I tapped ”+ New session” from the iOS app, I wasn’t getting a fresh assistant — I was talking to the same assistant, who already knew everything we’d been working on all day.
I gave it a tmux session, a systemd unit, and a name in the iOS app. I felt clever.
A few hours later, casual asks were costing 10x what they should have. By the end of the day, the session had accumulated about 400,000 tokens of context, and every “hey, what’s on my calendar tomorrow” was being billed against the entire pile. The persistent assistant was real. It was just very, very expensive to talk to.
Here’s the architecture I run now instead.
The bill that ended it
I’d been throwing one-line questions at the orchestrator all afternoon. “What did I commit to Freebo this morning?” “Pull up my notes on Jordan’s talk.” “Add a follow-up task for Nate.” Each one felt fast. Each one was, in fact, replaying ~400K tokens of accumulated session history through Opus before answering.
A fresh claude --print "what's on my calendar tomorrow" on Haiku, with the vault CLAUDE.md auto-loading, costs cents. The same question to a 400K-token Opus session was costing dollars. Not a typo. The model was the same calibre I’d use for a hard reasoning task. The question was about my calendar.
Persistent context isn’t free. I’d been treating “always thinking” like a feature when it was really a slowly-filling shopping cart that I never let anyone ring up.
What I actually need
Once I stopped romanticizing the always-on assistant, the real shape of my asks fell into four buckets.
-
A general dev shell. Long-running, real terminal, one tmux session I can
/resumeinto. This is where I write code. Persistent context is the point — it’s the conversation about the codebase I’m in. -
A grab-bag pool from anywhere on disk. Phone-spawned sessions when I’m at the coffee shop, in line at H-E-B, walking the dog. Each one ephemeral. Each one fresh. Working directory:
$HOME, so any of my repos are reachable. -
A vault-rooted pool that runs on the cheap default model. This is the bulk of my PA traffic — calendar lookups, task adds, vault searches, “draft a reply to mom.” Haiku is plenty. The vault
CLAUDE.mdauto-loads with my orchestrator instructions. Each tap = a fresh session. -
An on-demand Opus brain. Big synthesis tasks where I actually need the smart model and a long-running conversation: weekly review, a hard architectural question, a multi-step plan. Started manually, stopped when done. Never left running idle.
Four different services, four different shapes.
The four services
$ for s in claude-home claude-pa claude-pa-pool claude-remote-control; do
printf "%-25s %s\n" "$s" \
"$(systemctl --user is-active $s.service)"
done
claude-home active
claude-pa inactive
claude-pa-pool active
claude-remote-control active
Three are always-on. The Opus one is dormant by default. From the iOS Claude app, the always-on three show up as “home,” “Omarchy,” and “PA Haiku.” Tap one, get a fresh session in the right working directory with the right model.

The PA Haiku unit is the one that does the most work. It’s also the smallest:
[Unit]
Description=Claude Code PA Pool (Haiku, Remote Control, vault cwd)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=/home/brettr/Documents/Brett Omarchy
ExecStart=/usr/bin/claude remote-control \
--name "PA Haiku" --spawn same-dir --capacity 8
Restart=always
RestartSec=10
Environment=ANTHROPIC_API_KEY=
StandardOutput=append:%h/.local/share/claude-pa-pool.log
StandardError=append:%h/.local/share/claude-pa-pool.log
[Install]
WantedBy=default.target
Three things in there matter:
WorkingDirectoryis the vault. Means the vault’sCLAUDE.md— my orchestrator brain, the dispatch rules, the “ideas are sacred” rules, the source-of-truth pointers — auto-loads at session start. I don’t have to re-explain the system every time.- The default model is haiku via a vault-local
.claude/settings.local.jsoncontaining{ "model": "haiku" }. The model override is scoped to the vault, not global. When I’m in a code repo, settings stay default. Environment=ANTHROPIC_API_KEY=explicitly nukes the variable inside the service. Remote Control authenticates via claude.ai OAuth, not API key. If a staleANTHROPIC_API_KEYfrom your.bashrcleaks in, the relay fails silently. I fought this once. Now I belt-and-suspender it.

Fresh-per-tap is the trick
The big shift wasn’t the model downgrade. It was abandoning the one persistent assistant metaphor in favor of a small army of disposable assistants who all read from the same vault.
When I tap “PA Haiku” and ask “what’s on my calendar tomorrow,” what happens is:
- A brand-new Claude Code session spawns in the vault cwd.
CLAUDE.mdloads — orchestrator role, dispatch rules, MCP list.- The Calendar MCP gets called. The answer comes back.
- The session dies when I close the chat.
Total context: maybe 8K tokens of CLAUDE.md plus the question and the answer. No drift. No yesterday’s screenshots. No half-remembered tangents from this morning. Cheap, clean, and correct — because there’s no stale context to pollute the answer.
If I need continuity — a multi-step refactor, a strategy session — I either use my long-running dev shell (claude-home), or I manually start the Opus orchestrator and burn it down when I’m done.
# Start the Opus brain for a synthesis task
systemctl --user start claude-pa
claude-pa # tmux attach helper
# Stop it when I'm done. Always.
systemctl --user stop claude-pa
The 3 AM context flush
The dev shell (claude-home) is the one persistent session I keep. It’s the one where context does matter — I’m in a codebase, I’m iterating, I want /resume to work. But even there, context bloat is real. I forget to clear it. I close the laptop with 200K tokens loaded.
So there’s a nightly timer:
[Unit]
Description=Nightly restart of claude-home (3am, clears context)
[Timer]
OnCalendar=*-*-* 03:00:00
[Install]
WantedBy=timers.target
Persistent=false is the default and I want it that way. If my laptop is off at 3 AM (it usually is — I close the lid at night), the missed restart is skipped, not deferred. Otherwise the session would die the moment I sat down with coffee. Which would be hilarious exactly once.
What I’d tell anyone running a persistent Claude
Two questions, in order:
Does this session need memory of prior turns to answer the next question? If no — and 90% of my PA traffic is “no” — you don’t want a persistent session. You want a pool that hands you a fresh one per tap.
If yes, does it need Opus to answer? If no, run it on Haiku. The vault’s CLAUDE.md is the same file either model reads. Haiku’s good enough for ninety percent of orchestration work, and the bill is night and day.
The romance of an always-thinking assistant is real. So is the math behind a 400K-token bill. Pick disposable when you can. Pick persistent only when you actually need the memory. And kill the Opus orchestrator when you walk away from the desk.
Mine sleeps right now. I started it for an hour this morning to plan a wedding trip. Then I stopped it. The PA Haiku pool handled the next thirty asks for a tenth of the cost.
That’s the right shape.