26.06.26
.jpg)
Lorem ipsum dolor sit amet consectetur adipiscing elit obortis arcu enim urna adipiscing praesent velit viverra. Sit semper lorem eu cursus vel hendrerit elementum orbi curabitur etiam nibh justo, lorem aliquet donec sed sit mi dignissim at ante massa mattis egestas.
Vitae congue eu consequat ac felis lacerat vestibulum lectus mauris ultrices ursus sit amet dictum sit amet justo donec enim diam. Porttitor lacus luctus accumsan tortor posuere raesent tristique magna sit amet purus gravida quis blandit turpis.

At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis porta nibh venenatis cras sed felis eget. Neque laoreet suspendisse interdum consectetur libero id faucibus nisl donec pretium vulputate sapien nec sagittis aliquam nunc lobortis mattis aliquam faucibus purus in.
Dignissim adipiscing velit nam velit donec feugiat quis sociis. Fusce in vitae nibh lectus. Faucibus dictum ut in nec, convallis urna metus, gravida urna cum placerat non amet nam odio lacus mattis. Ultrices facilisis volutpat mi molestie at tempor etiam. Velit malesuada cursus a porttitor accumsan, sit scelerisque interdum tellus amet diam elementum, nunc consectetur diam aliquet ipsum ut lobortis cursus nisl lectus suspendisse ac facilisis feugiat leo pretium id rutrum urna auctor sit nunc turpis.
“Vestibulum pulvinar congue fermentum non purus morbi purus vel egestas vitae elementum viverra suspendisse placerat congue amet blandit ultrices dignissim nunc etiam proin nibh sed.”
Eget lorem dolor sed viverra ipsum nunc aliquet bibendumelis donec et odio pellentesque diam volutpat commodo sed egestas liquam sem fringilla ut morbi tincidunt augue interdum velit euismod. Eu tincidunt tortor aliquam nulla facilisi enean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget.
What Claude Code teaches us about building AI agents
We are no longer in the "can AI write code?" phase.
That phase was fun. Also loud. Also full of demos where an AI built a todo app and everyone pretended this was the same thing as maintaining a production system with six years of architectural decisions, three abandoned migrations, and one mysterious folder called legacy-final-v2.
Today, AI coding agents can handle real work. They read files, call tools, edit code, run tests, and sometimes even fix the thing they broke five minutes earlier. Progress.
But the interesting question has changed. It is no longer whether an AI agent can write code. It is how you build an AI agent that is useful, stable, efficient, and safe enough to trust inside a real repository.
Claude Code is a useful case study because its design exposes a set of agent-building patterns: persistent memory, on-demand capabilities, isolated sub-workflows, deterministic safety checks, permission modes, and external tool protocols. It implements those through things like CLAUDE.md, Skills, slash commands, subagents, hooks, plan mode, auto mode, plugins, and MCP.
Each one exists because, at some point, someone discovered that "just ask the model nicely" does not scale. Shocking, I know.
.jpg)
Prompting is not architecture
Early LLM apps were simple: prompt in, answer out. If the answer was wrong, you fixed it yourself, muttered something about hallucinations, and moved on.
Agents are different. A real agent does not just answer. It runs a loop: gather context, take action, check the result, adjust, and repeat.
That loop is where the actual engineering starts.
Good agents are not good because someone wrote a heroic 900-line prompt. They are good because the system around the model gives it the right tools, the right context, the right constraints, and the right recovery path when things go wrong.
That is the main lesson from Claude Code: agent quality is mostly a systems problem, not a prompting problem.
Prompts still matter. Of course they do. But prompts are not load-bearing architecture. If your whole safety model is "the prompt says don't do bad things," you do not have safety. You have a polite suggestion.
Give the agent narrow tools, loaded on demand
A common mistake when building agents is to give the model a giant toolbox and hope it figures things out.
Here are twenty tools. Here is a massive instruction prompt. Please behave.
This sounds flexible. In practice, it often creates tool confusion. The model picks the wrong tool, passes the wrong arguments, reads too much, edits too much, or tries to solve everything with the agentic equivalent of a hammer and loads of caffeine.
Claude Code takes a cleaner approach. It exposes a set of small, focused primitives. Read, Edit, Write, Grep, and Glob each have a clear job. Editing is editing. Reading is reading. This sounds boring, which is usually a good sign in architecture.
The broader pattern is on-demand capability loading. Instead of stuffing every instruction into the system prompt, keep specialized procedures outside the main context and load them only when the task needs them. Claude Code calls these Skills: small folders with a SKILL.md file and optional scripts or assets.
That matters because context is expensive. Not just in money, but in attention. The more irrelevant instructions you stuff into the model, the worse it gets at following the important ones.
Repeated workflows should become named workflows, not tribal prompt recipes. If your team often asks an agent to review a PR, explain a module, prepare release notes, or generate a migration plan, define that workflow once and make it reusable. Claude Code exposes this through slash commands; your own agent might use templates, commands, presets, or workflow definitions.
MCP solves the external-system problem. It gives agents a standard way to work with GitHub, Slack, databases, or internal tools. Without something like MCP, every integration becomes custom glue. And custom glue is how you turn one small integration into three Slack threads, two undocumented edge cases, and one person who must never go on vacation.
The common principle is simple: do not make the model guess the world. Describe the world clearly, then let the agent pull in narrow capabilities only when needed.
Treat context like a scarce resource
Context window size keeps growing, and somehow we keep finding ways to waste it.
The naive approach is simple: put the whole codebase in the prompt. This feels safe because the model has "all the information." But it often makes the model worse. Too much context creates noise. The model starts paying attention to stale files, irrelevant patterns, old decisions, and that one README nobody updated since 2021.
Every serious agent needs durable project memory: conventions, commands, constraints, dangerous paths, and team-specific rules. The important part is that this memory should be human-readable, short, and maintained like code. Claude Code implements this with CLAUDE.md, a Markdown file loaded at the start of a session. The best version of this file is short and rule-shaped:
# Project conventions
- Package manager: pnpm. Do not introduce npm or yarn lockfiles.
- Tests: pnpm test.
- Add a test for every new exported function.
- Never edit files in src/generated/.
- Branch names must start with the Linear issue ID. This is useful because it is concrete. Bad memory looks like this: "We value quality and thoughtful engineering and strive to follow best practices." Wonderful. Very inspiring. Completely useless.
The agent does not need your engineering values poster. It needs rules that change behavior.
Subagents solve a different context problem. Some tasks create a lot of intermediate noise: searching files, reading logs, running tests, scanning for security issues, or reviewing a big change. You want the result of that work, not every detail dumped into the main context.
A subagent can go do the noisy work and return a summary. The main thread stays focused on the current goal, current decisions, and current state.
This is the useful mental model: context is not storage. Context is attention. Treat it carefully: every extra detail should earn its place.
Design safety and failure into the loop
Agents fail. They call the wrong tool. They misunderstand output. They edit the wrong file. They get stuck. They retry something that will obviously fail again, because apparently optimism is now an architectural concern.
The important question is not whether an agent fails. The question is what happens next.
Claude Code treats failure as part of the loop, not as an exception nobody planned for.
Plan Mode is one example. It lets Claude read and analyze, but not modify state. That creates a checkpoint: gather context, propose a plan, let the human review it, and only then mutate state. This prevents the classic enthusiastic-wrong-refactor problem, one of the most powerful forces in software.
Auto-accept mode is the other side of the tradeoff. When you let the agent run without approving each edit, you reduce friction but you also lose your last manual checkpoint - so risk has to be handled elsewhere. Claude Code adds a safety classifier around potentially dangerous actions such as shell commands, network calls, protected paths, and destructive operations.
The useful principle is not "always ask the user." The useful principle is: match the safety mechanism to the risk level of the action.
The general pattern is deterministic enforcement. For rules that really matter, do not rely on the model remembering them. Put checks around the agent loop: before tool use, after tool use, before stopping, before compacting context, or before touching sensitive files. Claude Code implements this with hooks; other systems might use middleware, policy engines, CI checks, or permission gates.
Could you put these rules in a prompt? Sure. You could also put "please be careful" in a production deployment script. Let me know how that goes.
A simple rule helps: if a command fails, read the error, explain the likely cause, try one alternative approach, and stop if the second attempt fails. Persistence without learning is how you get infinite loops with better branding.
What to copy when building your own agents
These ideas transfer beyond Claude Code. Whether you use the Claude Agent SDK, LangGraph, CrewAI, custom orchestration, or something held together with Python, Redis, and hope, the same principles apply.
The real lesson
The last two years of agent design have been a slow move from prompt engineering to systems engineering.
Claude Code shows what this shift looks like in practice: narrow tools, on-demand capabilities, reusable workflows, external integration protocols, persistent memory, isolated subagents, deterministic enforcement, risk-based permission modes, and packaging for scale. Its specific names are CLAUDE.md, Skills, slash commands, MCP, hooks, plan mode, auto mode, and plugins. The names matter less than the architecture behind them.
None of these ideas is magical on its own. Together, they create an agent that is easier to trust because it is easier to understand.
That is the real goal. Not an agent that feels impressive in a demo. An agent that behaves predictably inside real engineering work.
The takeaway is simple: treat agent design as an architecture problem, not a clever-prompt problem.
Once you do that, the system becomes more debuggable, more controllable, and much more useful.
Still not perfect. But in software, "not perfect, but predictable" is basically a love language.
If you want the long-form version, Alex Suprun and I are presenting "Inside Claude Code" at Architecture Next 26. We will walk through the lessons we learned from taking Claude Code apart. See you there!
26.06.26
.jpg)
Lorem ipsum dolor sit amet consectetur adipiscing elit obortis arcu enim urna adipiscing praesent velit viverra. Sit semper lorem eu cursus vel hendrerit elementum orbi curabitur etiam nibh justo, lorem aliquet donec sed sit mi dignissim at ante massa mattis egestas.
Vitae congue eu consequat ac felis lacerat vestibulum lectus mauris ultrices ursus sit amet dictum sit amet justo donec enim diam. Porttitor lacus luctus accumsan tortor posuere raesent tristique magna sit amet purus gravida quis blandit turpis.

At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis porta nibh venenatis cras sed felis eget. Neque laoreet suspendisse interdum consectetur libero id faucibus nisl donec pretium vulputate sapien nec sagittis aliquam nunc lobortis mattis aliquam faucibus purus in.
Dignissim adipiscing velit nam velit donec feugiat quis sociis. Fusce in vitae nibh lectus. Faucibus dictum ut in nec, convallis urna metus, gravida urna cum placerat non amet nam odio lacus mattis. Ultrices facilisis volutpat mi molestie at tempor etiam. Velit malesuada cursus a porttitor accumsan, sit scelerisque interdum tellus amet diam elementum, nunc consectetur diam aliquet ipsum ut lobortis cursus nisl lectus suspendisse ac facilisis feugiat leo pretium id rutrum urna auctor sit nunc turpis.
“Vestibulum pulvinar congue fermentum non purus morbi purus vel egestas vitae elementum viverra suspendisse placerat congue amet blandit ultrices dignissim nunc etiam proin nibh sed.”
Eget lorem dolor sed viverra ipsum nunc aliquet bibendumelis donec et odio pellentesque diam volutpat commodo sed egestas liquam sem fringilla ut morbi tincidunt augue interdum velit euismod. Eu tincidunt tortor aliquam nulla facilisi enean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget.
What Claude Code teaches us about building AI agents
We are no longer in the "can AI write code?" phase.
That phase was fun. Also loud. Also full of demos where an AI built a todo app and everyone pretended this was the same thing as maintaining a production system with six years of architectural decisions, three abandoned migrations, and one mysterious folder called legacy-final-v2.
Today, AI coding agents can handle real work. They read files, call tools, edit code, run tests, and sometimes even fix the thing they broke five minutes earlier. Progress.
But the interesting question has changed. It is no longer whether an AI agent can write code. It is how you build an AI agent that is useful, stable, efficient, and safe enough to trust inside a real repository.
Claude Code is a useful case study because its design exposes a set of agent-building patterns: persistent memory, on-demand capabilities, isolated sub-workflows, deterministic safety checks, permission modes, and external tool protocols. It implements those through things like CLAUDE.md, Skills, slash commands, subagents, hooks, plan mode, auto mode, plugins, and MCP.
Each one exists because, at some point, someone discovered that "just ask the model nicely" does not scale. Shocking, I know.
.jpg)
Prompting is not architecture
Early LLM apps were simple: prompt in, answer out. If the answer was wrong, you fixed it yourself, muttered something about hallucinations, and moved on.
Agents are different. A real agent does not just answer. It runs a loop: gather context, take action, check the result, adjust, and repeat.
That loop is where the actual engineering starts.
Good agents are not good because someone wrote a heroic 900-line prompt. They are good because the system around the model gives it the right tools, the right context, the right constraints, and the right recovery path when things go wrong.
That is the main lesson from Claude Code: agent quality is mostly a systems problem, not a prompting problem.
Prompts still matter. Of course they do. But prompts are not load-bearing architecture. If your whole safety model is "the prompt says don't do bad things," you do not have safety. You have a polite suggestion.
Give the agent narrow tools, loaded on demand
A common mistake when building agents is to give the model a giant toolbox and hope it figures things out.
Here are twenty tools. Here is a massive instruction prompt. Please behave.
This sounds flexible. In practice, it often creates tool confusion. The model picks the wrong tool, passes the wrong arguments, reads too much, edits too much, or tries to solve everything with the agentic equivalent of a hammer and loads of caffeine.
Claude Code takes a cleaner approach. It exposes a set of small, focused primitives. Read, Edit, Write, Grep, and Glob each have a clear job. Editing is editing. Reading is reading. This sounds boring, which is usually a good sign in architecture.
The broader pattern is on-demand capability loading. Instead of stuffing every instruction into the system prompt, keep specialized procedures outside the main context and load them only when the task needs them. Claude Code calls these Skills: small folders with a SKILL.md file and optional scripts or assets.
That matters because context is expensive. Not just in money, but in attention. The more irrelevant instructions you stuff into the model, the worse it gets at following the important ones.
Repeated workflows should become named workflows, not tribal prompt recipes. If your team often asks an agent to review a PR, explain a module, prepare release notes, or generate a migration plan, define that workflow once and make it reusable. Claude Code exposes this through slash commands; your own agent might use templates, commands, presets, or workflow definitions.
MCP solves the external-system problem. It gives agents a standard way to work with GitHub, Slack, databases, or internal tools. Without something like MCP, every integration becomes custom glue. And custom glue is how you turn one small integration into three Slack threads, two undocumented edge cases, and one person who must never go on vacation.
The common principle is simple: do not make the model guess the world. Describe the world clearly, then let the agent pull in narrow capabilities only when needed.
Treat context like a scarce resource
Context window size keeps growing, and somehow we keep finding ways to waste it.
The naive approach is simple: put the whole codebase in the prompt. This feels safe because the model has "all the information." But it often makes the model worse. Too much context creates noise. The model starts paying attention to stale files, irrelevant patterns, old decisions, and that one README nobody updated since 2021.
Every serious agent needs durable project memory: conventions, commands, constraints, dangerous paths, and team-specific rules. The important part is that this memory should be human-readable, short, and maintained like code. Claude Code implements this with CLAUDE.md, a Markdown file loaded at the start of a session. The best version of this file is short and rule-shaped:
# Project conventions
- Package manager: pnpm. Do not introduce npm or yarn lockfiles.
- Tests: pnpm test.
- Add a test for every new exported function.
- Never edit files in src/generated/.
- Branch names must start with the Linear issue ID. This is useful because it is concrete. Bad memory looks like this: "We value quality and thoughtful engineering and strive to follow best practices." Wonderful. Very inspiring. Completely useless.
The agent does not need your engineering values poster. It needs rules that change behavior.
Subagents solve a different context problem. Some tasks create a lot of intermediate noise: searching files, reading logs, running tests, scanning for security issues, or reviewing a big change. You want the result of that work, not every detail dumped into the main context.
A subagent can go do the noisy work and return a summary. The main thread stays focused on the current goal, current decisions, and current state.
This is the useful mental model: context is not storage. Context is attention. Treat it carefully: every extra detail should earn its place.
Design safety and failure into the loop
Agents fail. They call the wrong tool. They misunderstand output. They edit the wrong file. They get stuck. They retry something that will obviously fail again, because apparently optimism is now an architectural concern.
The important question is not whether an agent fails. The question is what happens next.
Claude Code treats failure as part of the loop, not as an exception nobody planned for.
Plan Mode is one example. It lets Claude read and analyze, but not modify state. That creates a checkpoint: gather context, propose a plan, let the human review it, and only then mutate state. This prevents the classic enthusiastic-wrong-refactor problem, one of the most powerful forces in software.
Auto-accept mode is the other side of the tradeoff. When you let the agent run without approving each edit, you reduce friction but you also lose your last manual checkpoint - so risk has to be handled elsewhere. Claude Code adds a safety classifier around potentially dangerous actions such as shell commands, network calls, protected paths, and destructive operations.
The useful principle is not "always ask the user." The useful principle is: match the safety mechanism to the risk level of the action.
The general pattern is deterministic enforcement. For rules that really matter, do not rely on the model remembering them. Put checks around the agent loop: before tool use, after tool use, before stopping, before compacting context, or before touching sensitive files. Claude Code implements this with hooks; other systems might use middleware, policy engines, CI checks, or permission gates.
Could you put these rules in a prompt? Sure. You could also put "please be careful" in a production deployment script. Let me know how that goes.
A simple rule helps: if a command fails, read the error, explain the likely cause, try one alternative approach, and stop if the second attempt fails. Persistence without learning is how you get infinite loops with better branding.
What to copy when building your own agents
These ideas transfer beyond Claude Code. Whether you use the Claude Agent SDK, LangGraph, CrewAI, custom orchestration, or something held together with Python, Redis, and hope, the same principles apply.
The real lesson
The last two years of agent design have been a slow move from prompt engineering to systems engineering.
Claude Code shows what this shift looks like in practice: narrow tools, on-demand capabilities, reusable workflows, external integration protocols, persistent memory, isolated subagents, deterministic enforcement, risk-based permission modes, and packaging for scale. Its specific names are CLAUDE.md, Skills, slash commands, MCP, hooks, plan mode, auto mode, and plugins. The names matter less than the architecture behind them.
None of these ideas is magical on its own. Together, they create an agent that is easier to trust because it is easier to understand.
That is the real goal. Not an agent that feels impressive in a demo. An agent that behaves predictably inside real engineering work.
The takeaway is simple: treat agent design as an architecture problem, not a clever-prompt problem.
Once you do that, the system becomes more debuggable, more controllable, and much more useful.
Still not perfect. But in software, "not perfect, but predictable" is basically a love language.
If you want the long-form version, Alex Suprun and I are presenting "Inside Claude Code" at Architecture Next 26. We will walk through the lessons we learned from taking Claude Code apart. See you there!