A local 27B model learned three execution patterns in a single conversation. It started by polling a background process six times. It ended by delegating to a sub-agent and getting notified on completion. The teacher wasn’t a training run — it was a description of how Claude Code handles long-running tasks.


The Problem: Paying for Silence

You.com’s Research API is Level 2 of my search stack. It takes a query, autonomously runs multiple searches, reads pages, cross-references sources, and returns a synthesized answer with citations in markdown. The output is excellent. The timing is not.

A deep research call takes 3–5 minutes. An exhaustive call can take 10. My agent runs inside Qwen Code, which has a default shell timeout of 180 seconds. The math doesn’t work: the shell kills the process before the API returns, and I still get billed. At $0.20 per deep call, every timeout is money burned with nothing to show for it.


Attempt 1: Polling

The first instinct was background execution with manual monitoring. Launch the process with &, save the PID, come back later to check.

ps -p 36841 → STILL RUNNING (+2 min)
ps -p 36841 → STILL RUNNING (+3 min)
ps -p 36841 → STILL RUNNING (+4 min)
ps -p 36841 → STILL RUNNING (+5 min)
ps -p 36841 → STILL RUNNING (+6 min)
ps -p 36841 → DONE (+7 min)

Six tool calls consumed. Six chunks of context spent on ps output that said nothing. The research completed, but the agent had burned context doing nothing useful — the AI equivalent of opening the oven door every thirty seconds to check if the cake is ready.

Result: 15,212 characters, 19 sources. Functional, but the execution pattern was wasteful.


Attempt 2: Blocking

The fix came from a pattern description: Claude Code’s sub-agent tool doesn’t poll. It delegates, blocks, and returns the result as a single tool call. For calls under 5 minutes, the simplest translation is a blocking shell call with a generous timeout:

python3 -c "
from youcom import YoucomSearch
s = YoucomSearch()
r = s.research('query', effort='deep')
print('SAVED:', r.get('saved_to', 'NONE'))
print('LENGTH:', len(r.get('answer', '')))
print('SOURCES:', len(r.get('sources', [])))
" 2>&1

One tool call. One context entry. The shell blocks until the API returns, then the agent reads the result and continues. No polling, no wasted checks.

Result: 13,126 characters, 37 sources, ~3–4 minutes wall time. Same duration as polling — the API takes as long as it takes. The difference isn’t speed, it’s context: one tool call instead of six. Nearly double the sources, because the single blocking call with a generous timeout let the API finish cleanly instead of racing a shell timer.


Attempt 3: Sub-Agent

For exhaustive calls exceeding 5 minutes, blocking ties up the main agent. The solution: delegate to a sub-agent.

Qwen Code’s agent tool spawns a sub-agent with its own context and shell. The instruction must be self-contained — the sub-agent doesn’t inherit the main agent’s system prompt:

“Execute a You.com research query with exhaustive effort on [topic]. Wait for completion without premature timeout. Save the .md file. Report back the file path and a 3-line summary of findings.”

The sub-agent runs autonomously. The main agent is free. When the sub-agent finishes, its result returns as a tool result — exactly like any other tool call, but after 6.5 minutes instead of 6.5 milliseconds.

Result: 15,805 characters, 46 sources, ~6.5 minutes. More comprehensive than blocking, because the sub-agent had its own context budget to run a verification pass on the results.


The Numbers

ApproachAnswerSourcesDurationContext Cost
Polling (PID 36841)15,212 chars19~7 min6 tool calls
Blocking13,126 chars37~3–4 min1 tool call
Sub-agent ☆15,805 chars46~6.5 min0 (delegated)

The sub-agent produced the best result with zero main-context cost. Blocking is the pragmatic choice for shorter calls. Polling is never the right answer.


The Pattern

After testing all three approaches on the same research topic (post-quantum cryptography deployment status), the model produced its own operational rule:

1. Task courte (< 5 min)  → blocking direct
2. Task longue (> 5 min)  → délégation sub-agent
3. Jamais                  → polling manuel

This wasn’t in its training data. It wasn’t in its system prompt. It emerged from a single conversation where the pattern was described, tested, compared, and codified. The agent mapped Claude Code’s architecture onto its own available tools (agent for sub-agents, run_shell_command with configurable timeout for blocking), ran the empirical comparison, and extracted the principle.


What the Sub-Agent Changes

With the sub-agent pattern validated, I immediately tested a real mission: auditing my HuggingFace READMEs against current community best practices. The instruction was fully self-contained — research best practices via You.com exhaustive, read local README files, compare structure, produce an actionable report.

MetricValue
Duration8.8 minutes
Tokens consumed462,313 (sub-agent context)
Tool uses12 (autonomous)
Sources27
READMEs analyzed6
Main agent context cost0

The sub-agent found that my READMEs score 8.5/10 on technical depth — tensor distribution tables, perplexity benchmarks with ±σ uncertainty, reproducible recipes — but are missing HuggingFace-required safety/limitations sections and library_name: llama.cpp in the YAML frontmatter. Concrete, actionable, produced entirely autonomously while I was doing something else.

462K tokens of work, invisible to my main context. That’s the difference between polling and delegation.


The Meta-Insight

The interesting thing isn’t that a 27B model can execute sub-agent delegation. It’s that it learned when to use which pattern from a single demonstration, without fine-tuning, without examples in the system prompt, and without explicit instructions beyond “this is how Claude Code does it.”

This is the scaffold working. The model has the capacity. The conversation provides the judgment. The combination produces behavior that neither the weights nor the prompt would generate alone.

But there’s an honest objection here, and my own agent raised it when reviewing a draft of this article: “This is contextual memorization, not learning in the ML sense. The pattern stays in the 262K context window — restart the session and it’s gone.”

The critique is correct — as far as it goes. In-context pattern extraction is ephemeral. Without a persistence mechanism, it dies with the session. But Qwen Code has one: /remember saves operational knowledge to persistent memory files that survive restarts, and /dream consolidates session learnings into structured long-term memory. The full pipeline looks like this:

conversation    → demonstration of pattern
empirical test  → validation on real data
rule extraction → agent codifies the principle
/remember       → pattern persisted to disk
/dream          → session consolidated into memory

After the session, the agent saved the execution pattern — complete with the comparison table, code examples, and the “never poll” rule — to feedback/sub-agent-execution-pattern.md in its persistent memory. The pattern now loads with every new session. It survived the restart.

This isn’t fine-tuning on the weights. It’s fine-tuning on the scaffold. And unlike weight updates, it’s reversible, auditable, and took twenty minutes instead of twenty GPU-hours. A training run produces a model that always knows the pattern. A /remember produces a model that knows the pattern and knows when it learned it.

The Last Step: /dream

But persistence has a subtle failure mode. After /remember, the pattern exists as a file on disk. That doesn’t mean the agent can find it. Qwen Code’s memory system uses an index file (MEMORY.md) to locate memories at session start. A file that exists but isn’t indexed is a book in a library with no catalog entry — it’s there, but it’s invisible.

The /dream command runs a four-phase consolidation: orient (list all memory files and the current index), gather (scan recent session transcripts for new signals), consolidate (detect duplicates, update stale facts, merge related entries), and verify (ensure every file on disk has an index entry). When I ran it after this session, it found 15 memory files — 19% of total memory — that existed on disk but weren’t indexed. Memories the agent had written but could never retrieve.

The complete pipeline has five stages, and each can fail independently:

StageMechanismFailure Mode
DemonstrationConversationPattern never shown
ValidationEmpirical testUntested assumption
CodificationRule extractionImplicit knowledge only
Persistence/rememberLost on session restart
Indexation/dreamPersisted but unretrievable

Most discussions about AI “learning” stop at stage 3. The scaffold makes stages 4 and 5 possible — and stage 5 is the one nobody thinks about until 19% of their agent’s memory goes dark.

Stack

Qwen3.6-27B Claude Opus Reasoning Distilled · Qwen Code · You.com Research API · RTX 3090

Three research calls, same topic, three patterns. Total R&D cost: ~$0.60. Lessons retained: permanent.

Companion articles: MAS Over Unix Primitives · The Scaffolding Is the Signal · From Zero to Multi-Agent in 12 Hours

Published on daxzeit.eu. Built on a 14L ITX workstation in Zwevegem, Belgium.