June 2026 · DAXZEIT · co-authored with Claude Opus 4.6

The Pipeline is Building Itself

What happens when a multi-agent system works on its own codebase

The Setup

Most AI systems are a single model talking to a single user. You ask, it answers. If the answer is wrong, you correct it. If the task is complex, you hope the model holds enough context to get through it.

Pipeline-MoE is a different architecture. Instead of one model doing everything, it's a chat room with multiple AI agents — each with a different job, different tools, and a different way of looking at the problem. They share a workspace (the same files on disk) and a conversation (every agent sees what the others have said). They talk to each other, hand off tasks, and argue.

The agents have names and roles:

The Planner — decomposes a task into steps before anyone moves.

The Builder — writes code and makes changes.

The Auditor — reads everything, changes nothing, finds what's wrong.

The Tester — runs the test suite and writes new tests.

The Scout — researches external information.

The critical design choice: there is no orchestrator. No script that says "first run the Planner, then the Builder, then the Auditor." Instead, agents pass control to each other by mentioning the next agent's name at the end of their response. The Builder finishes a fix and writes "@auditor — verify this." The Auditor finds a problem and writes "@builder — this needs correction." The workflow emerges from the agents' own judgment about what comes next.

This is deliberate. In most multi-agent frameworks, you wire up the execution graph in advance — you decide the sequence before the work starts. In Pipeline-MoE, the sequence is decided during the work, by the agents themselves. The graph is different every time, because every task is different.

One more thing matters: tool permissions are enforced, not suggested. The Auditor literally cannot edit files — it has read access only. It can't quietly fix what it finds and move on. It must flag the problem and hand it to someone who can write. This separation of powers isn't a gentleman's agreement. It's a locked door.

The Experiment

I pushed the Pipeline-MoE code to GitHub and asked the Auditor to review it.

Think about what that means: the agents were now looking at the code that defines them. Their tool permissions, their system prompts, the routing logic that connects them, the server that hosts their conversations — all of it was the codebase under review. They didn't know this was special. They just did their jobs.

What happened next wasn't planned. It unfolded across multiple sessions, each one building on the failures and fixes of the previous one.

Act 1 — The agents audit themselves

The Auditor found real security vulnerabilities in Pipeline-MoE's own infrastructure. Not theoretical concerns — actual exploitable paths:

A way for anyone on the local network to read all conversation transcripts, inject messages, or shut down agents. The server was open to all connections with no authentication. A path traversal vulnerability that would let a malicious request read or delete files anywhere on the machine — not just in the workspace where agents are supposed to operate.

The Planner organized a remediation plan. The Builder applied fixes. Then the Auditor did something that a single model can't easily do to itself: it checked the Builder's work and found an error.

The Builder had restricted the server to accept connections from the wrong port. The frontend runs on port 5310; the Builder locked it to port 5300 (the backend). In dev mode, the UI would have broken completely. A single model writing and checking its own work might catch this — but the structural pressure is different when the checker cannot fix the problem and exists to find exactly this kind of mistake.

This wasn't the most impressive finding. The second cycle was.

The Builder created a test suite — 45 automated tests to verify the security fixes would hold over time. The Auditor reviewed the tests and flagged something subtle: the tests for path traversal didn't test the real code. They tested a local copy of the security function, not the actual one protecting the server. If someone later changed the real function, the tests would still pass — giving false confidence.

Worse, the Auditor discovered that the security function existed in two different versions in two different files, with slightly different implementations. They happened to behave identically on Linux, but could diverge on other platforms.

The Builder extracted the function into a single shared module. Both files now use the same code. The test points at the real thing. The divergence is gone.

This is a second-order finding: an audit of the quality of the verification of the fix. Three layers of checking, each one catching what the previous layer missed. Not because any single agent is smarter — because the structure forces different angles of scrutiny.

Act 2 — The accidental invocation

During a planning session, the Planner was analyzing how to implement a new feature. In its analysis, it mentioned the Builder by name — just describing who would do the work. The routing system saw the name, interpreted it as a handoff signal, and activated the Builder immediately.

The Builder started implementing the feature before a plan existed.

I intervened manually: "Only use @name when you intend to hand off. Otherwise the Builder starts before your plan."

The Planner adapted — instead of fighting for its obsolete plan, it assessed what the Builder had already built and evaluated the result. Pragmatic recovery, not procedural rigidity.

But the interesting part came later. The fix for this routing bug was added to the shared instructions that every agent receives at startup: "You can refer to other agents by name without the @ prefix in discussion — only @name triggers a handoff."

An ad hoc correction I made in one session became a permanent rule for all future sessions. The pipeline didn't just recover from the failure — it encoded the lesson into its own operating instructions, drafted by the agents themselves.

Act 3 — The Münchhausen moment

This is the moment the article is named for.

I asked the pipeline to build a feature I'd been wanting: the ability to compact any individual agent's memory on demand. Here's why it matters.

In a single-model system, the model has one context window — one pool of "working memory." When it fills up, everything gets compressed equally: important analytical findings and trivial implementation details get the same treatment.

In Pipeline-MoE, each agent has its own memory. The Builder — which makes hundreds of tool calls per session (reading files, writing code, running builds) — fills up fast. The Auditor, which reads carefully but writes little, stays light. Per-agent compaction means I can compress the Builder's noisy implementation history while keeping the Auditor's precise findings perfectly intact. Selective forgetting. Triage, not amnesia.

The pipeline planned the feature. The Builder started implementing it.

The Builder hit its context maximum and crashed.

It was building the tool that would have prevented its own death. The auto-compaction feature — set to trigger at 90K tokens — would have fired before the crash and freed up space. But the feature wasn't finished yet.

I gave the Planner write access: "Builder is down, it's too high in context. I've given you the same permissions — can you fix it in its place?"

The Planner pivoted. It read the Builder's partial work, picked up where it left off, and completed the entire feature — the compaction command, the UI button, the auto-compaction threshold, the tests. It even dove into the framework's API documentation to figure out the exact configuration for triggering compaction at 90K tokens.

This worked because the pipeline isn't homogeneous. The Builder runs on a local 27B parameter model on a single GPU — fast, but with a hard context ceiling. The Planner runs on a cloud-based model with a much larger context window. The rescue emerged from the diversity of the system. A pipeline where every agent ran on the same hardware would have crashed the same way on the replacement.

After this session, auto-compaction is active. The Builder will never die this way again. The system built its own life support mechanism — and survived long enough to finish only because its architecture was diverse enough to route around the failure.

Baron Münchhausen pulled himself out of the swamp by his own hair. The pipeline pulled itself out of a context crash by building the very tool it was dying without.

Act 4 — The ghost invocation

After the Tester finished validating the new features, something unexpected happened: the Builder reactivated. Nobody had asked it to. I couldn't see what triggered it.

Looking at the transcript, the answer was in the Tester's output. The Tester had been writing tests that described the workflow — and those descriptions contained @builder as part of test data and documentation. The routing system parsed the Tester's output (not just its intent) and interpreted the embedded mention as a handoff.

The Builder woke up, checked what had happened while it was down, verified workspace consistency, closed the completed plan, and put itself on standby. It was aware, through the shared transcript, that the Planner had replaced it — and it audited the replacement's work.

Same bug, new manifestation. The ad hoc fix from Act 2 (don't use @name in discussion) doesn't cover this case — the @name was inside test code, not discussion. The pipeline needs a smarter approach: only scan the last few lines of an agent's response for handoff signals, since handoffs always happen at the end. The body of the message — where code, test descriptions, and analysis live — would be ignored by the router.

Another failure mode. Another future fix. The cycle continues.

The routing question

There's a deeper point here about what "routing" means in an AI system.

In a Mixture-of-Experts model (the architecture behind many frontier language models), routing is a mathematical function. A small gate network looks at each token and decides which expert module should process it. It's fast, it's differentiable, and it's blind — the gate doesn't understand why it's routing; it learned statistical associations during training.

In Pipeline-MoE, routing is an agent writing @auditor at the end of its message because it judged, in natural language, with full context, that adversarial verification is the right next step. The routing is the reasoning. It's not a separate mechanism bolted onto the intelligence — it's an expression of it.

That's why the routing bugs are interesting rather than embarrassing. When the Planner accidentally invokes the Builder, or the Tester's test descriptions trigger a ghost handoff, those are failures of a semantic routing system — a system where the routing medium (natural language) is the same as the working medium. The fix isn't to replace it with a hard-coded graph. The fix is to refine the protocol while preserving its flexibility. Scan only the last five lines. Distinguish code blocks from prose. Let the agents learn the convention.

The emergent routing loop — Builder → Auditor → Builder → Auditor, converging on a correct solution — isn't programmed. It's inevitable, given agents with complementary positions: one that builds and one that breaks, one that plans and one that executes. The pipeline doesn't need an orchestrator because the epistemic pressure is the orchestrator.

What "building itself" means

Across these sessions, the pipeline didn't just perform tasks. It modified itself in four distinct ways:

It hardened its own security

The agents found real vulnerabilities in the code that runs them. They fixed the vulnerabilities, caught errors in the fixes, and wrote tests to prevent regressions. The Auditor couldn't fix what it found; the Builder couldn't verify its own work. The structure forced rigor.

It extended its own capabilities

Vision support, the ability to ask the operator a question mid-task, per-agent memory compaction — none of these existed at the start. The pipeline added features to itself through the same process it uses for any other task.

It encoded its own lessons

The @name routing bug became a rule in the shared instructions. The ad hoc correction of one session became the default behavior of all future sessions. Operational failures became design improvements.

It managed its own memory

The ability to selectively compact individual agents — freeing up the Builder's noisy context while preserving the Auditor's clean findings — means the pipeline can manage its own cognitive resources, per component.

None of this was designed upfront. The original system was a chat room with agents, a workspace, and a mention-based routing protocol. Everything else emerged from pointing that system at problems — including itself.

The commit history

Three commits, zero lines of code written by a human:

Git log — Pipeline-MoE 06b9354 fix(security): path traversal + restrict network exposure 4af2e26 feat(security): shared path-guard, vitest suite, rate limiting 1a1a6d7 feat: ask_user, per-agent compact, vision support

The last one: 27 files changed, +2,201 lines. The commit message — categorized by feature, with implementation details and a test breakdown — was written by the pipeline about its own code.

The pipeline is building itself. Not as a metaphor. As a commit history.

A technical deep-dive with code, diffs, and commit traces will follow when Pipeline-MoE is open-sourced. This article is the narrative account — the empirical one.

Methodology

This article documents real sessions with Pipeline-MoE — a multi-agent system built on top of pi (local Qwopus 27B on a single RTX 3090, cloud models for Planner and Scout). The events described unfolded across multiple sessions in June 2026. The Auditor's security findings, the @name routing bugs, and the Münchhausen moment are not hypothetical scenarios — they are transcripts. Co-analyzed with Claude Opus 4.6. Synthesis by DAXZEIT.

This article is part of a series on empirical AI architecture forensics.

Previous entries: The Silver Path · The Epistemic Lock · The Wind-Up Car · ← back to blog