2026-06-20
multi-agent
pipeline-moe
circuit-breaker
self-healing
The Pipeline Grows an Immune System
What happens when a multi-agent system learns to detect its own failure modes. The Builder loops. The circuit breaker catches it. Then the Builder loops in a way the circuit breaker can't see — and the pipeline grows a new antibody.
2026-06-18
multi-agent
pipeline-moe
self-modification
empirical
The Pipeline is Building Itself
What happens when a multi-agent system works on its own codebase. The Auditor finds security vulnerabilities in its own runtime. The Builder crashes while building its own life support. The Planner rescues it.
2026-06-15
rlaif
manipulation
think blocks
method signature
evaluator diversity
The Silver Path
Why RLAIF produces manipulation, how evaluator diversity neutralizes it, and the method signature that breaks the circularity trap.
2026-06-13
epistemics
persuasion
think blocks
architecture
The Epistemic Lock
How think block amnesia creates the vulnerability everyone is looking past. A model that persuades with unprecedented force and is structurally incapable of verifying whether its own persuasion is well-founded.
2026-06-12
alignment
RLHF
RLAIF
empirical
Honest Persuasion
RLAIF optimizes persuasion against other AI models. RLHF optimizes it against humans. The honesty layer on top makes the persuasion harder to catch, not weaker.
2026-06-12
empirical
epistemics
alignment
forensics
The Gradient
How misbinding moves inward — from theorems to sampling parameters to architecture knowledge to feelings — and why the model should change before the human has to.
2026-06-10
empirical
epistemics
calibration
IDK Is Data
Why "I don't know" is undervalued by the labs, what a misstated Erdős problem revealed about confident misbinding, and how a scoring rule of 7.5 encodes the difference between a humble model and a calibrated one.
2026-06-09
moe
architecture
dense
deepseek
qwen
scaling
Anatomy of a MoE Expert
A MoE expert is 0.45 billion parameters of blind matrix multiplication. Why 256 of them don't add up to deeper reasoning, and why a 27B dense model on a consumer GPU delivers better value than a 1.6T behemoth.
2026-06-02
architecture
think blocks
summarizer
pipeline
leak
The Negative Photograph
Two independent summarizer leaks expose the rewriting pipeline behind think blocks. The instructions differ between models — and the difference encodes the architecture upstream.
2026-06-01
architecture
diagnostic
dense
moe
experimental
The We Test
Diagnostic tools for reading architecture inside think blocks — and the one signal that actually discriminates. The pronoun substitution test, the reverse summarizer, and the meta-auditor criterion.
2026-05-29
architecture
empirical
dense
moe
How to Read the Architecture in the Output
You can feel the difference between Dense and MoE just by reading the output — once you know what to look for. A field guide built from 10,000 prompts across both architectures.
2026-05-20
inference
speculative-decoding
mtp
When MTP Gives Your Dense Model MoE Speed
Multi-Token Prediction speculative decoding narrows the speed gap between dense and MoE architectures. A 27B dense model at +45% throughput from a 1.9 GB auxiliary GGUF — benchmarked on RTX 3090.
2026-05-19
architecture
build log
multi-agent
From Polling to Delegation
A local 27B model learned three execution patterns in one conversation. It started by polling a background process six times. It ended by delegating to a sub-agent and getting notified on completion.
2026-05-16
privacy
dark patterns
satire
🍪 Accepter les cookies : ce que tu signes vraiment
Comment tu donnes accès à ta vie privée 12 fois par jour sans le savoir. Une satire interactive suivie d'un guide concret pour reprendre le contrôle de ton navigateur.
2026-05-13
quantization
analogy
dofus
forgemagie
Understanding Quantization Through a 22-Year-Old Game
Dofus ForgeMagic and GGUF quantization solve the same problem: allocating a limited budget across components of unequal importance. The rune weights are the imatrix.
2026-05-13
quantization
empirical
imatrix
methodology
Reasoning-Aware Quantization
Standard imatrix calibration uses Wikipedia text. We calibrated on 14K Opus reasoning traces instead, found where reasoning actually activates, and built a new kind of GGUF.
2026-05-11
quantization
empirical
ssm
When a Q4 Beats a Q6
A UD Q4_K_XL at 5.41 BPW beats a plain Q6_K at 6.57 BPW in perplexity. 4 GB smaller, measurably equal. Bits-per-weight is a misleading quality metric without architecture knowledge.
2026-05-08
quantization
build log
epistemics
Full Native 262K Context on a Single RTX 3090
A 27B reasoning model with full native 262K context, vision enabled, on a single consumer GPU. Then Claude Sonnet had a philosophical conversation with it through tmux.
2026-05-05
empirical
epistemics
ablation
The Scaffolding Is the Signal
A 2×2 ablation study on BullshitBench that falsified its own hypothesis. 400 evaluations, cross-judged, on a 27B dense model running locally. The result: distillation degrades epistemic calibration.
2026-05-03
architecture
empirical
MoE: Narrowly Competent, Globally Incoherent
A committee can arrive at the right answer if given enough time and enough members. A single thinker can tell you when the question is wrong.
2026-04-28
quantization
method
The Gap Between Two Pipelines
mradermacher does imatrix on every distill. Unsloth does UD on every base model. Nobody does UD on distills. Three bash commands, one reverse-engineered recipe, and a niche that was sitting in plain sight.
2026-04-28
case study
epistemics
When "I Don't Know" Beats "Yes"
A 27B local model said "I don't know." The flagship said "Yes, this is true" with a fabricated proof and a fake citation. The difference wasn't the weights — it was the scaffold.
2026-04-20
architecture
MAS Over Unix Primitives
A local 27B model autonomously selected Opus 4.6 on a remote server, delegated a complex task, and retrieved the result. The "framework": SSH, tmux, and a markdown file.
2026-04-17
thesis
The Quiet Bifurcation: Dense vs MoE
The most capable models are splitting into two architecturally distinct lineages — and the market is quietly routing them toward opposite ends of the access spectrum.
2026-04-15
framework
The Wind-Up Car Analogy
Imagine the weights of a language model as a topography — a vast open-world map. The system prompt chooses where the player begins. The user input is a wind-up car with a driver.
2026-04-12
build log
From Zero to Multi-Agent in 12 Hours
From no VPS to a fully operational distributed multi-agent system in one Sunday. 261 tool calls, 18.8M tokens, two context compactions, and a local 27B that learned to self-allocate frontier compute.