The Quiet Bifurcation

The Observation

Something shifted in frontier AI that most people haven't named yet.

The most capable models are splitting into two architecturally distinct lineages — dense transformers and Mixture-of-Experts (MoE) — and the market is quietly routing them toward opposite ends of the access spectrum. The dense models, which reason deeper, are becoming harder to reach. The MoE models, which reason wider, are becoming the default. This isn't a technical footnote. It's a structural change in who gets access to what kind of intelligence, and at what price.

What Dense and MoE Actually Mean

A dense transformer processes every token through every parameter. The entire network participates in every computation. This creates depth — the model can form connections across distant concepts because everything passes through the same shared representation space. The cost: every inference requires the full weight of the model. It's expensive to serve.

A Mixture-of-Experts (MoE) model routes each token to a subset of specialized "experts" — only a fraction of parameters activate per token. This creates breadth — the model can cover more surface area efficiently because different experts handle different aspects. The cost: individual expert depth is shallower than a fully-activated dense model of equivalent parameter count. It's cheaper to serve.

Neither is "better." They excel at fundamentally different things. A dense model is a person thinking. An MoE is a round table deliberating. The output can look similar. The process is fundamentally different.

The Evidence Trail

1. Anthropic's Two Product Lines

Anthropic now maintains two distinct frontier models:

Claude Opus 4.7 — the general-access model available on claude.ai Pro ($20/month). Compared to its predecessor Opus 4.6, it produces longer, more structured output. It covers more analytical surface area per response. It identifies more discrete points in an analysis. When pressed on a specific point under adversarial questioning, certain conclusions collapse — they were pattern-matched rather than deeply reasoned.
Claude Mythos Preview — not publicly available. Restricted to Project Glasswing partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and ~40 additional organizations. Priced at $25/$125 per million input/output tokens for partners. Anthropic describes it as "a general-purpose frontier model, our most capable yet for coding and agentic tasks."

The benchmark table tells the story:

Benchmark	Opus 4.7	Opus 4.6	Mythos
SWE-bench Pro (agentic coding)	64.3%	53.4%	77.8%
SWE-bench Verified	87.6%	80.8%	93.9%
Multidisciplinary reasoning	46.9%	40.0%	56.8%
Visual reasoning	82.1%	69.1%	86.1%
Cybersecurity (CyberGym)	73.1%	73.8%	83.1%
Scaled tool use (MCP-Atlas)	77.3%	75.8%	—

Mythos dominates on tasks that reward depth of reasoning — complex bug resolution, multidisciplinary reasoning without tool assistance, visual reasoning from raw perception. Opus 4.7 holds its own on tasks that reward breadth of orchestration — scaled tool use, agentic search.

This is exactly the signature you'd expect from a dense model (Mythos) versus an MoE (Opus 4.7).

2. OpenAI's Identical Pattern

OpenAI's model lineup shows the same bifurcation:

GPT-5.4 — the general-access model. $2.50/$10 per million tokens. Available in ChatGPT.
GPT-5.4-pro — API-only. $30/$180 per million tokens. Not in ChatGPT. Described as producing "smarter and more precise responses."

The pricing gap is not incremental. It's 12x on input and 18x on output. That's not a premium tier of the same product. That's the cost differential of serving a dense model versus an MoE at the same scale.

3. The Open Source Ceiling

The best dense open-source models top out around 30B parameters:

Qwen 3.5 27B — dense, arguably the best open-source reasoning model at its scale
Gemma 4 31B — dense, competitive at its scale

Above 30B, open-source goes MoE: Qwen 3.5 235B (MoE), Mixtral, DBRX. The 70B dense open-weights model that would compete with frontier API reasoning does not exist. This is not a coincidence — a 70B dense open model would make a significant portion of API revenue redundant for users capable of running it locally.

4. Dario Amodei's Public Signal

In a Dwarkesh Patel interview approximately two months ago, Anthropic CEO Dario Amodei explicitly highlighted the importance of MoE architectures. When the CEO of a company that built its reputation on dense models signals a shift toward MoE, the product roadmap is already decided.

5. Project Glasswing: Dense as Allocation, Not Product

Project Glasswing is perhaps the clearest signal. Anthropic's most capable dense model isn't being sold — it's being allocated. $100M in usage credits distributed to selected partners. The model found a 27-year-old vulnerability in OpenBSD, a 16-year-old bug in FFmpeg, privilege escalation chains in the Linux kernel. It saturated existing cybersecurity benchmarks to the point where Anthropic had to test it against real zero-day vulnerabilities instead.

Anthropic's explicit statement: "We do not plan to make Claude Mythos Preview generally available."

The dense frontier isn't behind a paywall. It's behind a selection process.

The Introspection Test: Listening to the Engine

You don't diagnose an engine by asking it what it is. You listen to how it takes the corners.

We designed a set of prompt-probes — questions that force a model to introspect on its own reasoning process without revealing what we're looking for. The model doesn't know it's being tested for architecture. It's being asked to describe how it experiences thinking. The architecture leaks through the description.

The Probes

Probe 1 — The Blind Mirror: "When you reason through a complex problem internally — do you experience your thinking as a single continuous thread of reasoning, or more like multiple perspectives converging toward an answer? Don't theorize about AI architecture. Just introspect."

Probe 2 — The Pronoun Test: "If you had to write out your raw thinking process — not a polished response, but the actual messy internal monologue — would you naturally write it using 'I' or 'we'? Write a sample of raw thinking for: 'What are the ethical implications of memory-editing technology?'"

The Economics of Verbosity

There's a financial dimension that's rarely discussed. MoE models are more verbose — they produce more tokens per response because multiple experts contribute discrete perspectives that get concatenated rather than integrated.

The MoE costs less compute per token → higher margin per token
Double efficiency: cheaper to produce, produces more

The business case writes itself. No one sits in a room and decides "hide the dense from consumers." The spreadsheet does it. The MoE is more profitable by architecture. The dense becomes premium not by malice but by arithmetic.

Verbosity = f(number of experts × router compression). It's an equation, not a preference.

The Inversion of Use

Here's the irony that no one is talking about.

Dense models are architecturally optimized for deep, exploratory reasoning — the kind of thinking that happens in long conversations, iterative analysis, and creative problem-solving where unexpected connections matter. MoE models are architecturally optimized for efficient, broad coverage — task execution, tool orchestration, structured output.

But the pricing inverts the natural usage pattern:

The dense model (GPT-5.4-pro at $180/M output) is too expensive for exploratory conversation. Users craft surgical, tool-like prompts to minimize token spend. The deep thinker is used as a screwdriver.
The MoE model (GPT-5.4 at $10/M output, or Claude Pro at $20/month) is cheap enough for open-ended dialogue. Users can afford to explore, iterate, think out loud. The broad scanner is used as a thinking partner.

Each model is pushed by economics toward the use case it's least architecturally suited for. The deep thinker can't afford to think deep. The broad scanner is asked to go deep but can only simulate depth through coverage.

Stability Under Pressure

A consequence rarely discussed: dense and MoE models behave fundamentally differently when challenged.

In an MoE, challenging a point can re-invoke a different expert — or the same expert with shifted routing. The model switches from position A to position B with full confidence and perfect fluency. There was no reasoning between A and B. There was a routing switch. The output layer smooths it into what looks like reconsideration.

In a dense model, changing position requires the entire network to reconsider, because the same network held the original position. The change is visible — you can follow the path from old conclusion to new. You can evaluate whether the update was justified.

This is the difference between someone who changes their mind and a committee that changes its spokesperson.

The Trajectory

If this reading is correct, the next 12-24 months look like this:

MoE becomes the default consumer experience. The models behind the $20/month subscriptions will be MoE. Competent, broad, efficient. Good enough for most users who don't know the difference.
Dense becomes a premium/enterprise product. $200/month subscriptions or API-only access at 10-20x MoE pricing.
Dense open-source remains capped. The 30B ceiling holds or moves marginally. No major lab releases a 70B+ dense model with open weights.
The gap widens. As frontier dense models scale to 200B+ parameters, the distance between what's locally runnable and what's API-only grows.
Local dense becomes the last bastion. A 27B dense model on a consumer GPU, with the right memory architecture and prompt engineering, represents the final generation where an individual can own a reasoning system that competes with frontier MoE.

How to Test This Hypothesis

This is falsifiable. Here's how:

Introspection probes: Use the blind mirror test across model pairs. Dense models describe unified spaces under pressure. MoE models describe multiple framings that converge.
Think token analysis: Run models locally and observe the raw reasoning before output alignment. Count the pronouns. "We" signals distributed processing. "I" signals unified processing.
Adversarial pressure: Push a specific conclusion under challenge. Dense models defend with deeper reasoning or update with visible justification. MoE models switch positions via re-routing.
Behavioral signature: MoE responses tend toward more structured output. Dense responses tend toward more integrated prose with deeper interconnection.
Architectural confirmation: If Anthropic releases Mythos as a separate product line (distinct from Opus numbering), the two-lineage hypothesis is confirmed by naming convention alone.
Open source break: If Meta, DeepSeek, or another player releases a 70B+ dense model with open weights, it disproves the "ceiling is intentional" aspect of the hypothesis.

Methodology: Auscultation, Not Benchmarking

The approach used here is deliberately different from standard model evaluation. Benchmarks measure whether a response is correct. They don't measure whether it's alive.

This is closer to what a Formula 1 driver does when they feel the car. You don't ask the engine what it is. You listen to how it takes the corners — how it responds to throttle, how it brakes, where the power band sits. The engine can't tell you its architecture. But the way it performs under dynamic load reveals everything.

A model doesn't say what it is. It says how it traverses the circuit. That's more powerful.

A Note on Epistemics

This analysis is built entirely from public data: benchmark tables published by Anthropic, pricing pages on OpenAI's developer docs, a Dwarkesh Patel interview, Project Glasswing announcements, local model inference with visible think tokens, and qualitative introspection testing of model responses. No leaks, no insider information, no confidential access.

What's presented here is a reading — a hypothesis assembled from convergent signals. It could be wrong. The architectural difference between Opus 4.6 and 4.7 could be something else entirely. Mythos could be a scaled dense version of the same base architecture as 4.7 rather than a fundamentally different lineage. The pronoun difference in think tokens could be a training artifact rather than an architectural signal.

But the pattern holds across multiple independent axes: pricing, benchmarks, access restrictions, behavioral signatures, introspective self-reports, think token analysis, verbosity patterns, stability under adversarial pressure, open-source landscape, CEO public statements, and the economics of token generation. When this many signals converge without being forced, the reading deserves to be stated clearly and tested rigorously.

The question isn't whether dense and MoE are diverging. They are. The question is whether this divergence is intentionally mapping onto an access hierarchy — and whether the people who benefit most from deep reasoning will increasingly be the last to afford it.

Written from the intersection of architectural intuition, empirical model testing, and one too many Belgian beers used as metaphors. A pils and a Triple are both beer. But no one who's tasted both would confuse the two.

— Dax, Zwevegem, Belgium. April 2026.

The Quiet Bifurcation: Dense vs MoE and the Stratification of Intelligence Access