Understanding Quantization Through a 22-Year-Old Game

I have 10,150 hours in Dofus. Seven years of subscription. A level 166 Féca on Ombre — the permadeath server where one mistake erases everything. I spent thousands of those hours in ForgeMagic workshops, optimizing rune allocations on equipment. I didn’t know it at the time, but I was learning quantization.

ForgeMagic is a crafting system where you upgrade equipment stats using runes. Each rune has a hidden weight. Each item has a hidden density budget. You can’t add everything — you have to choose where to invest, and every investment has a cost somewhere else. The system rewards players who understand the weight table and allocate precision where it matters most.

GGUF quantization is a compression system where you reduce model precision to fit in less memory. Each tensor has a measurable importance. Each model has a fixed bit budget. You can’t keep everything at full precision — you have to choose where to invest, and every promotion has a cost somewhere else. The system rewards engineers who understand the importance profile and allocate bits where they matter most.

It’s the same problem.

The Weight Table

In ForgeMagic, every rune has a hidden weight that determines how expensive it is to add. A PA rune (action points) weighs 100. A Vitalité rune weighs 1. The game doesn’t show these numbers — you learn them through experience, community wikis, or 10,000 hours of trial and error.

In quantization, every tensor has a measurable importance that determines how much quality is lost when you reduce its precision. blk.63.ffn_down has an importance of 58,620. A mid-network ffn_gate has an importance of 600. The imatrix measures these numbers empirically.

ForgeMagic Weights
PA (Action Points)	100
PM (Movement Points)	90
Portée (Range)	51
Invocation	30
Dommages	20
Critique	10
Sagesse	3
Vitalité	1
Initiative	1

Tensor Importance (Reasoning imatrix)
`blk.63.ffn_down`	58,620
`blk.62.ffn_down`	11,233
`blk.3.attn_k/q/v`	8,913
`blk.61.ffn_gate`	8,171
`blk.60.ffn_gate`	7,664
`blk.53.ffn_gate`	5,026
`blk.49.ffn_gate`	3,514
`blk.15.ffn_gate`	725
`blk.1.ssm_out`	0.88

A forgemage who doesn’t know the weight table will waste resources putting heavy runes where they don’t fit and light runes where they don’t matter. A quantizer who doesn’t know the importance profile will waste bits giving Q6_K to tensors that would be fine at Q4_K, while starving tensors that need Q8_0 or F16.

The first lesson of both systems: not all components are equal, and the budget is always limited.

The Sink

ForgeMagic has a mechanic called the puits (sink). When a heavy stat falls off an item — say you lose a PA (weight 100) while applying a small rune (weight 1) — the difference in weight (99) becomes a hidden reservoir called the reliquat. While the sink exists, subsequent runes consume it instead of degrading other stats. The sink is your free budget. It lets you add precision without paying for it, until it runs out.

The RA quantization methodology has an identical mechanic. When I demoted blocks 12–16 from Q5_K to Q4_K (the “heavy stat falling off”), it freed approximately 0.06 BPW of headroom. That headroom is the sink. I used it to promote blocks 49–53 from Q4_K to Q6_K without increasing the total model size. The sink absorbed the cost.

Mechanic	ForgeMagic	Quantization
Create the sink	Let a heavy stat (PA/PM) fall	Demote low-importance tensors (blk.12–16 → Q4_K)
Use the sink	Apply runes for free while reliquat absorbs	Promote high-importance tensors (blk.49–53 → Q6_K)
Sink runs out	Next rune degrades other stats	Next promotion increases BPW/size
Overmage	Push a stat above its natural max using sink	Promote to Q8_0/F16 above the base recipe level

An experienced forgemage creates the sink deliberately. They don’t wait for a stat to fall accidentally — they force it, because they know the sink is the most valuable resource in the workshop. The RA methodology does the same: it deliberately degrades the mid-network to create headroom for the late-network. The data (imatrix ratio 0.66–0.76 on blocks 12–16) justifies the demotion. The promotion is paid for before it happens.

The Exo PA

In ForgeMagic, the ultimate move is the exo PA — adding an action point to an item that doesn’t naturally have one. It costs weight 100 (the heaviest in the game). It has a 1% success rate per attempt. It requires hours of preparation: building sink, stabilizing other stats, choosing the right moment. When it lands, it transforms a good item into a legendary one. The entire forgemagie session is organized around this single moment.

In the RA3 recipe, blk.63.ffn_down at F16 is the exo PA. It has an importance of 58,620 — five times the second-ranked tensor. It’s the single most expensive precision investment in the model. Every other decision in the recipe — the mid-network demotion, the late-network promotion, the SSM Q8_0 — exists to create the budget and context that makes this one investment possible. The entire methodology is organized around preserving this tensor at maximum precision.

In both systems, the principle is the same: identify the one component that matters most, and build the entire strategy around protecting it.

The Order of Operations

Every experienced forgemage follows the same sequence: small runes first (while the sink is large and absorbs failures), medium runes next, and the heavy rune last. You stabilize the item before you risk it. You never attempt the exo PA on an unstabilized item — if it fails, it will destroy the wrong stats.

The RA recipe follows the same logic across its three variants:

Step	ForgeMagic	RA Methodology
1. Foundation	Stabilize small stats while sink is big	RA: swap blk.12–16 ↓ / blk.49–53 ↑ (Q6_K level)
2. Build up	Apply medium runes, consume remaining sink	RA2: promote top-12 tensors to Q8_0
3. The big move	Attempt the exo PA / overmage	RA3: promote top-7 to F16

Each step depends on the previous one. You don’t F16 a tensor before the surrounding recipe is stable. You don’t exo PA before the item is properly maged. The structure is identical because the constraint is identical: you’re allocating a limited resource across components of unequal importance, and the order in which you allocate determines whether the budget survives the process.

When the Exo Doesn’t Land

Sometimes the exo PA fails. In ForgeMagic, you lose stats, rebuild the sink, and try again. The failure is expensive but not catastrophic — the item survives, just degraded.

We tested this in quantization. RA4 promoted all 48 SSM outputs to F16 — the most aggressive allocation possible for the recurrent pathway. The result: identical PPL to RA3 on both wikitext and reasoning benchmarks. The “exo” on the SSM pathway didn’t land. The 1.5 GB of extra precision produced zero measurable improvement.

In ForgeMagic terms: the rune passed (no quality loss) but the stat was already maxed (no quality gain). The item was already at its density ceiling for that pathway. The forgemage reads the journal, sees “−Reliquat” instead of stat gains, and knows the sink absorbed the attempt without effect. The quantizer reads the PPL, sees no delta, and knows the pathway was already at its precision floor.

The lesson is the same in both systems: data tells you when to stop. The forgemage watches the journal. The quantizer watches the PPL. When the numbers stop moving, the budget is better spent elsewhere.

Two Types of Knowledge

ForgeMagic has a tension that forgemages learn to navigate. The weight table tells you that PA costs 100 and Vitalité costs 1. But the weight table doesn’t tell you that on a permadeath server, 100 extra Vitalité might save your character’s life. The weight system measures crafting cost, not combat value. The forgemage needs both: knowledge of the weight table and knowledge of the game.

Quantization has the exact same tension. The imatrix measures activation importance per token. It ranks SSM outputs at positions 400–496 out of 496 — the least important tensors in the model. But SSM outputs carry recurrent state across the entire sequence. Their errors don’t matter per token; they compound per sequence. The imatrix measures cost. The architectural argument measures value. The quantizer needs both.

Knowledge type	ForgeMagic	Quantization
Data-driven	Rune weight table	imatrix importance profile
Domain-driven	Combat knowledge, server rules, class synergies	Architecture knowledge, error propagation, use case
Tension	Vitalité weighs 1 but saves lives on Ombre	SSM weighs 0.88 but stabilizes long sequences

The best forgemages are the ones who know both tables — the crafting weights and the combat values. The best quantizers are the ones who know both profiles — the imatrix data and the architectural constraints. Neither alone is sufficient.

What the Uniform Forgemage Gets Wrong

Imagine a forgemage who doesn’t know the weight table. They apply runes uniformly — a little Vitalité here, a little Force there, spreading the budget evenly across every stat. The item looks balanced. It is mediocre. It has no standout stat, no combat advantage, no identity.

This is a Q6_K plain quantization. It allocates 6.57 bits per weight uniformly to every tensor — embeddings, attention projections, feed-forward gates, SSM outputs. It looks balanced. It is wasteful. It gives Q6_K to tensors that would be fine at Q4_K, and only Q6_K to tensors that need Q8_0 or F16. The name “Q6” sounds better than “Q4.” The number is higher. The item is still mediocre.

It gets worse when you consider context. In Dofus, Prospection increases loot drop rates from monsters. It’s essential for PvE farming — and completely dead weight in PvP. A Kolizéum item with 0 Prospection has a lower total stat weight on paper, but every point not invested in Prospection is a point invested in Force, Agilité, or PA — the stats that win fights. The item looks worse in the Atelier. It performs better in the arena.

The mid-network tensors (blocks 12–16) are the Prospection of a reasoning model. Their activation ratio on reasoning traces is 0.66–0.76 — they’re active during generic text continuation (PvE) but underutilized during multi-step reasoning (PvP). Demoting them to Q4_K is dropping Prospection to 0 on a PvP item: the total BPW decreases, but the freed budget goes directly into the late-network FFN and early attention layers that actually resolve reasoning chains. The model looks smaller on paper. It reasons better in practice.

The total weight of an item only matters if every stat matters for your use case. On a PvE drop farmer, every stat matters. On a PvP build, Prospection is noise. On a generic text model, every tensor matters equally. On a reasoning distill, the mid-network is noise. The RA methodology is context-aware allocation — building a PvP item, not a balanced one.

Now imagine the expert forgemage. They study the weight table. They identify the one stat that defines the build — PA for a Féca, PM for a Sacrieur, Range for a Cra. They create a massive sink by dropping the cheap stats. They pour everything into that one defining stat. They exo PA at 1% and keep trying until it lands. The item has less total stat weight than the uniform version. It is vastly more powerful, because the right stat is maxed.

This is the RA methodology. It studies the importance profile. It identifies the tensors that define reasoning quality. It creates a sink by demoting the mid-network. It pours the budget into late-network FFN and early attention. It F16s the single most critical tensor. The model has less total BPW than the Q6_K plain — 5.69 vs 6.57. It is at least as good on generic text. On reasoning, it’s measurably better.

The expert’s item at lower total weight beats the novice’s item at higher total weight. In ForgeMagic, this is common knowledge. In quantization, it’s a thesis that most of the community hasn’t internalized yet.

The Numbers

Measured on wikitext-2 and on 14K Opus reasoning traces. RTX 3090.

Variant	BPW	Size	PPL wiki	PPL reasoning
Q4_K_XL (uniform expert)	5.41	17.0 GB	6.8341	2.6839
RA3_XL (targeted expert)	5.69	18.0 GB	6.8411	2.6825
RA4_XL (overmage attempt)	6.36	19.4 GB	6.8389	2.6825
Q6_K plain (uniform novice)	6.57	21.0 GB	baseline reference

RA3 at 5.69 BPW and 18 GB matches or beats a Q6_K plain at 6.57 BPW and 21 GB. Less total weight, more targeted allocation, same or better result. The expert forgemage’s item, vindicated by the numbers.

You Were Already an Expert

If you’ve ever maged an item in Dofus, you already understand quantization. You know that uniform allocation wastes resources. You know that the weight table matters more than the total stat budget. You know that creating a sink is a deliberate strategy, not an accident. You know that the exo PA is worth the preparation. You know that data (the journal, the −Reliquat) tells you when to stop.

The vocabulary is different. The substrate is different. The problem is the same: allocate a limited budget across components of unequal importance, using data to guide the allocation and domain knowledge to handle what the data can’t see.

10,150 hours of ForgeMagic. I didn’t know I was training for this.

Glossary

ForgeMagic	Quantization
Rune weight	Tensor importance (imatrix)
Item density	Model BPW
Puits / reliquat (sink)	BPW headroom from demotions
Succès Critique (SC)	Promotion with no PPL cost
Échec Critique (EC)	Promotion that degrades quality
Overmage	Promoting above base recipe (Q8_0, F16)
Exo PA (1% exotic)	F16 on `blk.63.ffn_down`
Rune Pa/Ra tiers	Q4_K / Q5_K / Q6_K / Q8_0 / F16
Forge journal (−Reliquat)	PPL measurement
Weight table (hidden)	imatrix importance matrix
Combat knowledge (Ombre)	Architecture knowledge (SSM, MoE)
Prospection (dead weight in PvP)	Mid-network tensors (dead weight for reasoning)
PvP item (context-optimized)	RA recipe (reasoning-optimized)

Companion articles: Reasoning-Aware Quantization · When a Q4 Beats a Q6 · The Gap Between Two Pipelines

Published on daxzeit.eu. 10,150 hours of ForgeMagic. 14L ITX workstation. Zwevegem, Belgium.