Three Bugs Between My Agent and Its Memory

A few days ago I posted about wiring Hermes up to Honcho so my terminal agent would finally remember me between sessions. The setup looked clean. Systemd unit running, gateway responsive, messages flowing in. I declared victory and moved on.

It was lying to me.

When I went back to check what Honcho actually knew about me — I had been chatting with the agent for days at that point — the answer was nothing. Zero observations. Zero conclusions. Just a queue of 31 unprocessed “representation units” sitting in Postgres, waiting for a deriver that was running but not actually deriving.

Here is the chain of three bugs I had to unwind before the agent’s memory came online.

Bug 1: the deriver was batching itself into a corner

Honcho is built around a background worker called the deriver. It reads from a queue of conversation units, groups them, then asks an LLM to extract observations (“user lives in Austin”) and conclusions (“user does not want auto-prioritization”) about the user. Those go into a vector store the agent can query later.

The deriver was running. The queue had stuff in it. Nothing was getting derived.

The reason was a config default that makes total sense for a multi-tenant SaaS and zero sense for a single-user PA: representation work only flushes when the batched token count hits a max. Default REPRESENTATION_BATCH_MAX_TOKENS=1024. My biggest single unit was 937 tokens. The deriver would batch one unit, sit at 937, wait politely for a second unit to push it over 1024, and never get one because conversations end.

It was a queue full of conversations that were each individually too small to ever flush, and collectively never close enough together in time to combine.

The fix was one environment variable:

# ~/Projects/honcho/.env
DERIVER_FLUSH_ENABLED=true

This tells the deriver to process whatever it has at the end of a unit instead of waiting for the batch ceiling. For a single-user agent this is the right default. For a 10,000-user SaaS it would torch your LLM bill. I get why the default is what it is — but Honcho’s deployment docs could stand to flag this for self-hosters.

Deriver flush config

Bug 2: a 401 from OpenAI for a request I never sent to OpenAI

Restart the deriver, watch the queue actually start moving. The LLM call to extract observations runs. Then the save step blows up:

401 Unauthorized — Incorrect API key provided

Which is funny, because the OpenAI key I had in .env was an OpenRouter key (sk-or-v1-...). It worked fine for the deriver’s reasoning calls — those were going through OpenRouter to a Haiku model. So why was something 401’ing?

Because saving an observation requires embedding it. And Honcho’s embedder, by default, calls OpenAI’s embedding endpoint directly — bypassing the routing layer entirely. OpenRouter cannot proxy OpenAI’s embedding API. My key was being sent to api.openai.com/v1/embeddings where, as far as OpenAI is concerned, sk-or-v1-anything is just a malformed string.

So I had two different LLM providers wired up to one config slot and the deriver was reaching for whichever one fit each call. Embeddings: OpenAI directly. Reasoning: OpenRouter. The error message blamed the key. The actual problem was the routing.

Bug 3: switching providers without breaking the vector column

The fix to bug 2 was to stop routing embeddings through OpenAI at all. I had a Google API key already set up for Gemini, so I switched embeddings over:

EMBEDDING_MODEL_CONFIG__TRANSPORT=gemini
EMBEDDING_MODEL_CONFIG__MODEL=gemini-embedding-001
LLM_GEMINI_API_KEY=<google_key>

Restart the deriver. Watch a unit get embedded. Watch the database write fail.

ERROR: expected 1536 dimensions, not 3072

Right. Embedding models have native output dimensions. OpenAI’s text-embedding-3-small is 1536. Gemini’s gemini-embedding-001 is 3072 native. Honcho’s documents table was created with a vector(1536) column — that is the schema the migrations wrote on day one when I was still on OpenAI.

Two ways out:

Drop and recreate the column at 3072 dims, lose any embeddings already in there.
Tell Gemini to output at 1536 dims, which the model supports as a configuration option.

Honcho exposes that as VECTOR_DIMENSIONS=1536, and when set, it passes output_dimensionality=1536 to the Gemini call. Gemini truncates and renormalizes server-side. The vector column is happy. The deriver is happy.

VECTOR_DIMENSIONS=1536

One line. Saved me a migration.

The result

After all three fixes were in, I forced the queue to re-process the stuck units:

UPDATE queue SET processed = false WHERE processed = true;

The deriver picked them up. Thirty-one out of thirty-one processed. 99 observations and 96 conclusions about me, derived from a week of casual conversation with the agent.

I queried Honcho directly to see what it learned. The answers are surprisingly accurate:

I have ADHD and need bounded tasks
I go to bed before midnight on weeknights
I dislike auto-reprioritization — I want to control my task order manually
I use a specific set of command phrases when delegating work
My calendar lives in Google Calendar and Todoist holds the truth on tasks

None of that is in any prompt I wrote. The agent inferred it from how I talked to it.

What I’d tell past me

Day 1

Wire it up

Followed the docs, everything green. Did not verify anything had actually been derived.
Day 4

Check the queue

31 units stuck. Realized 'service is running' is not the same as 'service is working.'
Day 4 +1h

Fix flush

Set DERIVER_FLUSH_ENABLED=true. Queue starts moving. New error appears.
Day 4 +2h

Fix embeddings provider

OpenRouter key cannot proxy OpenAI embeddings. Switch transport to Gemini.
Day 4 +3h

Fix dimensions

Pin VECTOR_DIMENSIONS=1536 so Gemini matches the existing pgvector column.
Day 4 +4h

Re-process

Flip processed=false on the stuck queue rows. Deriver picks them up. 99 observations land.

The pattern across all three bugs is the same: a default that is correct for the most common deployment shape (multi-tenant, OpenAI-native, fresh database) but wrong for mine (single user, multi-provider, schema already locked in). None of these are bugs in Honcho. They are the seams where one set of assumptions meets another.

Self-hosting any non-trivial service means owning those seams. The docs will get you to “running.” Getting to “actually working” is a debugging sport, and the score is kept in the queue table.

The most important habit I picked up from this: after wiring any background worker, query the thing it is supposed to produce. Not “is the process up?” Not “is the queue empty?” Did the artifact you actually care about — the observation, the embedding, the row — appear? If not, the service is decorative.

My agent remembers me now. It took three days longer than I thought it would. The next agent I wire up to memory will take three hours, because I know which questions to ask before I declare victory.