
2026 / Free online book · Field Manuals
Agents That Actually Work
The narrow band where autonomy earns its keep
Access
Free
Chapters
12
Read time
140 min
Agent demos are intoxicating and agent production is sobering. The truth is there is a real, narrow band of tasks where an agent beats a script and a human both. This manual is about locating that band for your domain, instrumenting it, and resisting the pressure to widen it before the evidence does.
Between the demo and the disappointment is a thin set of tasks where agents are genuinely better. How to find it and stay inside it.
This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.
Table of contents
INT Introduction: The Demo That Lied An agent that aced the demo broke production because one tool had side effects, one response was ambiguous, and nobody could find where the plan changed. 8 min 01 What an Agent Is and Is Not Most systems people call agents are scripts, workflows, or copilots in costume, and the mislabel is where the cost comes from. 10 min 02 The Autonomy Ladder Six rungs from script to autonomous agent, with promotion criteria so you climb on evidence instead of enthusiasm. 10 min 03 The BOUND Test Five gates that decide whether a task deserves an agent at all, applied before a single line of orchestration code. 10 min 04 Bounded Goals and Task Contracts Turning a fuzzy objective into a machine-checkable contract with success conditions, scope boundaries, and a stopping rule. 10 min 05 Tools, Permissions, and Side Effects The Tool Trust Contract: the seven properties every tool must declare before an agent is allowed to call it. 10 min 06 State, Memory, and Context Control An agent's reliability is bounded by what it can see, what it remembers, and what it carries between steps, all of which you must control deliberately. 10 min 07 Planning Without Wandering The agent loop is where autonomy lives and where it runs away, so it needs budgets, convergence checks, and a forced landing. 10 min 08 Evaluating Agent Runs Task success is the metric everyone reports and the one that hides the most, so an agent eval measures the path, not just the destination. 10 min 09 Human Handoffs and Approvals Approval gates earn their keep only when they intercept the right actions, give the human enough to decide, and do not become a rubber stamp. 10 min 10 Observability and Replay If you cannot reconstruct exactly what the agent saw, decided, and did at every step, you do not have an agent in production, you have a liability you cannot debug. 10 min 11 Security Boundaries for Tool-Using Systems An agent processes instructions and untrusted data in the same channel, so its tools become an attacker's tools unless you draw the boundaries deliberately. 12 min 12 Incidents, Kill Switches, and Rollback Every agent fails eventually, so the question that decides your worst day is whether you can stop it, unwind it, and learn from it. 11 min END Conclusion: The Most Accountable Agent Wins The best agents are not the most autonomous; they are the most accountable, and the BOUND Checklist is how you ship one. 9 min
