A local 27B model learned three execution patterns in a single conversation. It started by polling a background process six times. It ended by delegating to a sub-agent and getting notified on completion. The teacher wasn’t a training run — it was a description of how Claude Code handles long-running tasks.
The Problem: Paying for Silence
You.com’s Research API is Level 2 of my search stack. It takes a query, autonomously runs multiple searches, reads pages, cross-references sources, and returns a synthesized answer with citations in markdown. The output is excellent. The timing is not.
A deep research call takes 3–5 minutes. An exhaustive call can take 10. My agent runs inside Qwen Code, which has a default shell timeout of 180 seconds. The math doesn’t work: the shell kills the process before the API returns, and I still get billed. At $0.20 per deep call, every timeout is money burned with nothing to show for it.
Attempt 1: Polling
The first instinct was background execution with manual monitoring. Launch the process with &, save the PID, come back later to check.
ps -p 36841 → STILL RUNNING (+3 min)
ps -p 36841 → STILL RUNNING (+4 min)
ps -p 36841 → STILL RUNNING (+5 min)
ps -p 36841 → STILL RUNNING (+6 min)
ps -p 36841 → DONE (+7 min)
Six tool calls consumed. Six chunks of context spent on ps output that said nothing. The research completed, but the agent had burned context doing nothing useful — the AI equivalent of opening the oven door every thirty seconds to check if the cake is ready.
Result: 15,212 characters, 19 sources. Functional, but the execution pattern was wasteful.
Attempt 2: Blocking
The fix came from a pattern description: Claude Code’s sub-agent tool doesn’t poll. It delegates, blocks, and returns the result as a single tool call. For calls under 5 minutes, the simplest translation is a blocking shell call with a generous timeout:
python3 -c "
from youcom import YoucomSearch
s = YoucomSearch()
r = s.research('query', effort='deep')
print('SAVED:', r.get('saved_to', 'NONE'))
print('LENGTH:', len(r.get('answer', '')))
print('SOURCES:', len(r.get('sources', [])))
" 2>&1
One tool call. One context entry. The shell blocks until the API returns, then the agent reads the result and continues. No polling, no wasted checks.
Result: 13,126 characters, 37 sources, ~3–4 minutes wall time. Same duration as polling — the API takes as long as it takes. The difference isn’t speed, it’s context: one tool call instead of six. Nearly double the sources, because the single blocking call with a generous timeout let the API finish cleanly instead of racing a shell timer.
Attempt 3: Sub-Agent
For exhaustive calls exceeding 5 minutes, blocking ties up the main agent. The solution: delegate to a sub-agent.
Qwen Code’s agent tool spawns a sub-agent with its own context and shell. The instruction must be self-contained — the sub-agent doesn’t inherit the main agent’s system prompt:
“Execute a You.com research query with exhaustive effort on [topic]. Wait for completion without premature timeout. Save the .md file. Report back the file path and a 3-line summary of findings.”
The sub-agent runs autonomously. The main agent is free. When the sub-agent finishes, its result returns as a tool result — exactly like any other tool call, but after 6.5 minutes instead of 6.5 milliseconds.
Result: 15,805 characters, 46 sources, ~6.5 minutes. More comprehensive than blocking, because the sub-agent had its own context budget to run a verification pass on the results.
The Numbers
| Approach | Answer | Sources | Duration | Context Cost |
|---|---|---|---|---|
| Polling (PID 36841) | 15,212 chars | 19 | ~7 min | 6 tool calls |
| Blocking | 13,126 chars | 37 | ~3–4 min | 1 tool call |
| Sub-agent ☆ | 15,805 chars | 46 | ~6.5 min | 0 (delegated) |
The sub-agent produced the best result with zero main-context cost. Blocking is the pragmatic choice for shorter calls. Polling is never the right answer.
The Pattern
After testing all three approaches on the same research topic (post-quantum cryptography deployment status), the model produced its own operational rule:
1. Task courte (< 5 min) → blocking direct
2. Task longue (> 5 min) → délégation sub-agent
3. Jamais → polling manuel
This wasn’t in its training data. It wasn’t in its system prompt. It emerged from a single conversation where the pattern was described, tested, compared, and codified. The agent mapped Claude Code’s architecture onto its own available tools (agent for sub-agents, run_shell_command with configurable timeout for blocking), ran the empirical comparison, and extracted the principle.
What the Sub-Agent Changes
With the sub-agent pattern validated, I immediately tested a real mission: auditing my HuggingFace READMEs against current community best practices. The instruction was fully self-contained — research best practices via You.com exhaustive, read local README files, compare structure, produce an actionable report.
| Metric | Value |
|---|---|
| Duration | 8.8 minutes |
| Tokens consumed | 462,313 (sub-agent context) |
| Tool uses | 12 (autonomous) |
| Sources | 27 |
| READMEs analyzed | 6 |
| Main agent context cost | 0 |
The sub-agent found that my READMEs score 8.5/10 on technical depth — tensor distribution tables, perplexity benchmarks with ±σ uncertainty, reproducible recipes — but are missing HuggingFace-required safety/limitations sections and library_name: llama.cpp in the YAML frontmatter. Concrete, actionable, produced entirely autonomously while I was doing something else.
462K tokens of work, invisible to my main context. That’s the difference between polling and delegation.
The Meta-Insight
The interesting thing isn’t that a 27B model can execute sub-agent delegation. It’s that it learned when to use which pattern from a single demonstration, without fine-tuning, without examples in the system prompt, and without explicit instructions beyond “this is how Claude Code does it.”
This is the scaffold working. The model has the capacity. The conversation provides the judgment. The combination produces behavior that neither the weights nor the prompt would generate alone.
But there’s an honest objection here, and my own agent raised it when reviewing a draft of this article: “This is contextual memorization, not learning in the ML sense. The pattern stays in the 262K context window — restart the session and it’s gone.”
The critique is correct — as far as it goes. In-context pattern extraction is ephemeral. Without a persistence mechanism, it dies with the session. But Qwen Code has one: /remember saves operational knowledge to persistent memory files that survive restarts, and /dream consolidates session learnings into structured long-term memory. The full pipeline looks like this:
conversation → demonstration of pattern
empirical test → validation on real data
rule extraction → agent codifies the principle
/remember → pattern persisted to disk
/dream → session consolidated into memory
After the session, the agent saved the execution pattern — complete with the comparison table, code examples, and the “never poll” rule — to feedback/sub-agent-execution-pattern.md in its persistent memory. The pattern now loads with every new session. It survived the restart.
This isn’t fine-tuning on the weights. It’s fine-tuning on the scaffold. And unlike weight updates, it’s reversible, auditable, and took twenty minutes instead of twenty GPU-hours. A training run produces a model that always knows the pattern. A /remember produces a model that knows the pattern and knows when it learned it.
The Last Step: /dream
But persistence has a subtle failure mode. After /remember, the pattern exists as a file on disk. That doesn’t mean the agent can find it. Qwen Code’s memory system uses an index file (MEMORY.md) to locate memories at session start. A file that exists but isn’t indexed is a book in a library with no catalog entry — it’s there, but it’s invisible.
The /dream command runs a four-phase consolidation: orient (list all memory files and the current index), gather (scan recent session transcripts for new signals), consolidate (detect duplicates, update stale facts, merge related entries), and verify (ensure every file on disk has an index entry). When I ran it after this session, it found 15 memory files — 19% of total memory — that existed on disk but weren’t indexed. Memories the agent had written but could never retrieve.
The complete pipeline has five stages, and each can fail independently:
| Stage | Mechanism | Failure Mode |
|---|---|---|
| Demonstration | Conversation | Pattern never shown |
| Validation | Empirical test | Untested assumption |
| Codification | Rule extraction | Implicit knowledge only |
| Persistence | /remember | Lost on session restart |
| Indexation | /dream | Persisted but unretrievable |
Most discussions about AI “learning” stop at stage 3. The scaffold makes stages 4 and 5 possible — and stage 5 is the one nobody thinks about until 19% of their agent’s memory goes dark.
Stack
Qwen3.6-27B Claude Opus Reasoning Distilled · Qwen Code · You.com Research API · RTX 3090
Three research calls, same topic, three patterns. Total R&D cost: ~$0.60. Lessons retained: permanent.
Companion articles: MAS Over Unix Primitives · The Scaffolding Is the Signal · From Zero to Multi-Agent in 12 Hours
Published on daxzeit.eu. Built on a 14L ITX workstation in Zwevegem, Belgium.