
2026 / Free online book · Technical Deep Dives
Guardrails, Not Walls
Building Useful AI Safety Constraints Without Blocking the Product
Access
Free
Chapters
15
Read time
158 min
Most guardrails are built to pass an audit, not to prevent harm. This deep dive distinguishes the two: where input and output filtering earns its latency, where it is theater, and how to constrain a probabilistic system without making it useless.
Safety theater blocks the demo and misses the risk. Building constraints that protect users without strangling the product.
This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.
Table of contents
FM Front Matter: Guardrails, Not Walls Building Useful AI Safety Constraints Without Blocking the Product 5 min INT Introduction: Two Failures in One Afternoon The details are composited from several real deployments, but the shape is exact. A retail company shipped a customer-support assistant. 10 min 01 The Two-Sided Failure > **Working claim: ** Overblocking and underblocking are not two ends of a single dial you tune toward a happy medium. 11 min 02 Five Words That Get Confused > **Working claim: ** "Safety" is used as a catch-all for five different concerns, safety, security, compliance, reliability, and product policy, that have different owners, different threat models, different controls, and different acceptable failure rates. 10 min 03 The System Boundary and the ROAD Framework > **Working claim: ** You cannot place a guardrail until you can draw the system boundary and name every operation inside it where a risk can occur. 9 min 04 From Prose to Policy > **Working claim: ** A guardrail without a policy is just an arbitrary model instruction. 9 min 05 Risk Tiering and Intent > **Working claim: ** The strength of a guardrail should scale with the stakes of the action and the confidence in the user's intent, not with the scariness of a keyword. 9 min 06 The Gatehouse > **Working claim: ** Input controls are a gatehouse with several lanes, identity, intent, moderation, injection detection, abuse limiting, PII handling, not a single bouncer. 9 min 07 The Retrieval Firewall > **Working claim: ** The most under-guarded operation in most AI systems is the hidden middle, retrieval and prompt assembly. 8 min 08 Output Controls > **Working claim: ** A model output can be perfectly fluent, perfectly safe-sounding, and still be invalid, wrong shape for the consumer, leaking data, unsupported by evidence, or valid JSON that violates policy. 8 min 09 Refusal Is the Last Resort > **Working claim: ** A bare refusal is the bluntest action a guardrail has, and reaching for it first is what makes products feel like walls. 9 min 10 Tool Guardrails > **Working claim: ** Once a model can act, guardrails stop being about text and become about authorization. 8 min 11 The Side-Effect Ladder > **Working claim: ** The right amount of autonomy for an agent is a function of one thing: how hard it is to undo what it does. Read actions can run freely; irreversible actions need a human. 8 min 12 Testing Guardrails Like a Product > **Working claim: ** A guardrail you have not measured on both error rates is a guardrail you do not understand. 8 min 13 Monitoring and the Bypass Incident > **Working claim: ** Guardrails decay in production even when the code does not change, because users, attackers, content, and policy all drift around a fixed control. 9 min 14 Eight Playbooks This chapter turns eight playbooks into a concrete operating problem for the guardrails book. 10 min 15 The Guardrail System > **Working claim: ** Guardrails are not a feature you add; they are a property of how the whole system is built. 9 min A Appendix A: Back Matter Glossary, implementation checklist, and source register for the book. 9 min
