Deep Agents vs Shallow Agents: Building Systems That Scale¶
Every LLM agent starts the same way: a model calling tools in a loop. Build a weather bot? Easy. Create a code reviewer? Works fine. But try to build an agent that writes a feature across multiple files, runs tests, debugs failures, and iterates—suddenly your simple loop falls apart.
The difference between "shallow agents" and "deep agents" isn't model size or some novel algorithm. It's four architectural choices that enable agents to handle complex, multi-step tasks over longer time horizons. Here's what I've learned from studying agents so far.
Why Shallow Agents Break¶
A shallow agent is just an LLM calling tools in a loop with no planning, memory management, or task delegation. The basic pattern—receive prompt, call tools, repeat—works until:
Context explodes. Your agent's 20th tool call returns 5,000 lines of logs. Now every subsequent LLM call costs more and performs worse as the context fills with noise.
Tasks compound. Ask your agent to "analyze user feedback and update docs accordingly." Without explicit decomposition, it might search feedback, get overwhelmed by volume, and lose track of the documentation update entirely.
Memory vanishes. Your agent finds critical info in step 3 but forgets it by step 8 because the context is flooded with intermediate outputs.
Retries thrash. A subtask fails. The agent tries again with the entire conversation history, making the same mistake because it can't isolate and focus.
These aren't LLM limitations—they're architecture limitations. Production agents solve them with scaffolding around the basic loop.
Four Patterns That Make Agents Reliable¶
External Memory via File Tools¶
The simplest fix for context overflow: give your agent a scratch pad.
Add write_file, read_file, and edit_file tools. Now when your agent pulls 10,000 lines from an API, it writes them to analysis_data.json instead of keeping them in context. When it needs to reference that data three steps later, it reads the specific section instead of scrolling through chat history.
This isn't about replacing vector databases or semantic search—it's about giving agents a place to dump intermediate state. Think of it as RAM vs storage: context window is your RAM (fast, expensive, limited), file system is storage (cheap, persistent, unlimited).
Real example: An agent debugging test failures. Instead of keeping every test output in context, it writes failures to failures.log, summarizes them, and works through fixes one at a time by reading individual entries.
Explicit Planning Tools¶
Here's a weird trick: give your agent a tool that does nothing but structure its thinking.
Create a track_plan tool that accepts a list of steps and current status. It doesn't execute anything—just validates the structure and echoes it back. But forcing the agent to call this tool before executing creates a planning phase.
Without it: "Fix the authentication bug" → agent immediately starts editing files, loses track after the third change.
With it: "Fix the authentication bug" → agent calls track_plan(["reproduce bug", "locate auth logic", "identify failure point", "implement fix", "verify"]) → now it has a checklist and can report progress.
The tool itself is nearly a no-op, but the act of calling it transforms scattered reasoning into structured execution. It's metacognition through API design.
Hierarchical Task Delegation¶
When a task has distinct phases, don't make one agent do everything. Spawn focused workers.
Build a spawn_agent tool that launches a sub-agent with:
- Its own system prompt tailored to the subtask
- A bounded context (it doesn't inherit the parent's full history)
- Specific tools for its domain
Example flow:
1. Main agent gets "build a new API endpoint"
2. Calls spawn_agent(task="write implementation", context="spec doc", tools=["read_file", "write_file"])
3. Sub-agent writes code in isolation, returns result
4. Main agent calls spawn_agent(task="write tests", context="implementation", tools=["read_file", "write_file", "run_tests"])
5. Test agent works independently
Each sub-agent works in a clean context without the parent's accumulated history. The parent stays focused on orchestration, not implementation details.
This also enables parallelization—spawn multiple sub-agents for independent work, collect results later.
Detailed System Prompts with Examples¶
Prompt engineering isn't dead. The best agents have system prompts that are instruction manuals, not one-liners.
Instead of:
Use:
You are a coding assistant. When given a task:
1. Before editing, ALWAYS read the file to understand current state
2. Make surgical edits—don't rewrite entire files
3. After changes, verify by reading the affected sections
TOOL USAGE:
- read_file: Use before any edit operation
- write_file: Only for new files
- edit_file: For modifications to existing files
EXAMPLE - Fixing a bug:
User: "Fix the null pointer error in auth.py:line 45"
Assistant: [reads auth.py] → [identifies error] → [makes targeted edit] → [confirms fix by reading back]
EXAMPLE - Handling failures:
If a tool call fails, write the error to errors.log and analyze before retrying.
Include edge cases, tool usage patterns, error handling, and concrete examples. Every few-shot example prunes down failure modes.
Why? Because LLMs are next-token predictors. Detailed prompts with examples give them better patterns to complete against.
Putting It Together¶
Here's what a robust agent architecture looks like:
class TaskAgent:
def __init__(self):
self.system_prompt = load_detailed_instructions()
self.tools = [
# Core execution
read_file, write_file, edit_file,
run_command, search_code,
# Planning & tracking
track_plan, update_progress,
# Delegation
spawn_subagent, wait_for_agent,
]
def run(self, task):
# Agent decomposes task using track_plan
# Uses file tools to manage large outputs
# Spawns sub-agents for distinct subtasks
# Each sub-agent has tailored prompt + tools
pass
The loop is still simple: generate → call tools → repeat. But the tools enable: - Persistence through file operations - Decomposition through planning tools - Focus through sub-agent delegation - Reliability through detailed prompts
When to Add Complexity¶
Start with the simplest agent that could work. Add layers only when you hit limits:
- Agent losing track of multi-step tasks? → Add planning tools
- Context blowing up? → Add file system tools
- Single subtask consuming all the context? → Add sub-agent spawning
- Agent making preventable mistakes? → Extend system prompt with examples
Don't build a complex agent system for tasks that a single ReAct loop can handle. But when you need an agent that reliably executes research, coding, or analysis workflows—these four patterns are what transform shallow agents into deep agents.
The deep agents that impress you with complex reasoning aren't using secret models or exotic architectures. They're shallow agents with better scaffolding: planning tools, file systems, sub-agents, and detailed prompts around the same basic loop you're already running.