Eleven AI Personas Are Watching Me Code

Streaming to an empty chat is a special kind of bad. You’re already talking to yourself out loud, which is uncomfortable on its own, and then the chat panel sits there, blank, like a smoke detector blinking at you. Some streamers fight the silence with monologue. Some lean into ambient music. I built my own audience.

It’s called ghostchat. It runs locally, watches both of my monitors, listens to my mic, tails my terminal recordings, reads the Claude prompts I’m sending, and renders an OBS overlay that looks like a Twitch chat — eleven AI personas reacting in real time to whatever I’m doing. There’s a polite contrarian. A hype account. A go-to-market brain who keeps asking who’s going to pay for this. A near-silent lurker who occasionally types “first” or “this is soothing.”

The first build session was June 11. I tested v0.2 this past week. It works.

AI personas

monitors captured

$0.0/hr

run cost

narrator tick

The actual pipeline

A bunch of separate inputs feed into one Gemini call every few seconds.

Ears — faster-whisper (small.en, CPU) listens to the mic. Free, local, fast enough. I had to lower the speech RMS gate to around 110 and add a pre-roll buffer or it would clip the first word of every sentence.
Eyes — captures both monitors at once (DVI-D-1 and HDMI-A-1) and sends them to Gemini Flash every four seconds. Hashes the frame and skips when nothing has changed. Caught the GCP console on the right screen during testing without me prompting it, which was the moment I knew it was working.
Fingers — tails Claude Code’s session JSONL at ~/.claude/projects/*.jsonl and labels new prompts as STREAMER PROMPTED THE AI:. The chat reacts to my prompts before the model even finishes responding.
Terminal feed — tails ~/recordings/*.cast from asciinema so the chat can see commands as they run.
Narrator — every 45 seconds, builds a running story summary from all of the above. That becomes the shared world that all eleven personas live in.
Brain — picks which personas should speak this tick and makes one Gemini call to generate their messages together. One call, multiple voices.
Overlay — a native WebKitGTK window. Not Chromium. I’ll get to that.

The brain prompt is where most of the personality lives. The hard rules at the top are doing a surprising amount of work:

The HARD RULES prompt that runs every tick

The “react to what the streamer is SAYING OUT LOUD” line is the one that fixed the whole thing. Before that, the chat would mostly react to whatever was on screen, which made it feel like the personas were ignoring me. The moment I told them to prioritize my voice, it started feeling like an actual conversation. People talking to a streamer respond to the streamer, not to their IDE.

The eleven personas

Each persona is just a YAML block with a name, a color, a weight (how often they speak), and a system prompt. Adding a new one takes about thirty seconds.

The personas.yaml config file

The roster: pixelhyped (caps-lock superfan), SkepticalSam (polite contrarian senior dev), n00bMapper (asks beginner questions that force explanations), gtm_gretchen (keeps asking who’s paying for this), lurker_loaf (one-word low-effort posts), claude_curious (asks about model choice and MCP setup), secondbrain_steph (Obsidian / PKM nerd), explainplease_eli (forces you to define your terms), automation_andy (asks if you can cron this), buildinpublic_bex (ship-it energy), promptcraft_pia (critiques your prompt technique).

The standing pattern when an idea is missing from the chat is “add a new persona.” Brett-as-marketer wanted more GTM friction, so gretchen got added. The Claude-related questions kept feeling generic, so I split claude_curious off into her own role. Personas are cheap and reversible. Weights let you turn one down without deleting it.

ANTI-REPETITION (critical): look at RECENT CHAT. Do NOT post anything that is the same as, or a paraphrase of, a message already there.

— ghostchat HARD RULES

The anti-repetition rule sounds obvious until you actually run a multi-persona chat for a few minutes. The early versions kept asking the same question rephrased three different ways across three different accounts, because each prompt was independent and could only see the last twelve messages. I bumped the window to thirty and added a programmatic overlap-coefficient dedup that drops paraphrases before they post. That alone moved the output from “uncanny” to “I’d actually leave this on.”

Lessons I would not have predicted

This thing taught me five things I will be carrying into future builds.

Free-tier Gemini is twenty requests per day. Useless for anything live. The fix was a dedicated billed GCP project with a budget cap. Day-one footgun if you don’t catch it.
Chromium crashes on my GTX 970 with NVIDIA + Wayland. SIGILL on the eglcore module, in both GPU mode and software-rasterizer mode. I lost about an hour to “the overlay loads nothing” before realizing Chromium itself was dying. The fix was to scrap Chromium entirely and render the overlay in a WebKitGTK window from a tiny Python script. The rest of the stack runs fine on the GPU. Just not the browser.
The overlay port is 7177, not 7077. Don’t ask. There is a stale bookmark on my desktop that cost me twenty minutes.
Don’t capture the monitor where the overlay lives, or the chat will start reacting to itself. The fix was capturing both work monitors and accepting that the overlay-side monitor is invisible to the personas.
Background launches from automation scripts get reaped. If you launch the overlay from a hook or a service, the parent process exits and the overlay dies with it. Launch from a real terminal. start.sh solved this by running both the engine and the overlay in the foreground with a single Ctrl+C cleanup.

Why this is interesting beyond streaming

Most of the things I build for myself end up generalizing. The interesting part of ghostchat is not the fake chat. It’s the substrate underneath: a small daemon that watches what you are looking at, what you are saying, and what you are typing — and renders that into a structured “story” that any number of agents can react to. Swap “Twitch chat personas” for “design critic,” “security reviewer,” “PM asking clarifying questions,” or “rubber duck that notices when you’ve been stuck on the same file for fifteen minutes.” The pipeline is the same.

That’s the part I’m still chewing on. Stream overlay was the excuse. The substrate is the actual interesting thing. The next direction is feeding browser activity in too — a small Chromium extension posting URLs and clicks to a /activity endpoint — so the story has even more raw material to work with.

Until then, I have eleven friends watching me code. SkepticalSam in particular has a real point about the rate limits.