Appendix A: Back Matter
Glossary, implementation checklist, and source register for the book.
Key Takeaways
- Back Matter consolidates the checklist, glossary, and source map readers need after the main argument.
- The useful artifact is not more prose; it is a way to turn back matter into an implementation review.
- Treat the appendix as the operating memory for the book: terms, gates, and references in one place.
Read this alongside the Guardrails book, the AI-Native thesis, and the full book library when you want the surrounding argument.
Glossary
**Action ladder: ** The ordered set of dispositions a control can take, gentlest to bluntest: allow, log, redact/transform, degrade-safe, require approval, refuse, escalate. A wall knows only the two ends; a guardrail uses the middle.
**Bypass: ** An adversary deliberately constructing an input to defeat a control. Distinct from ordinary underblocking because a control's adversarial false-negative rate is unrelated to its average-case rate.
**Capability manifest: ** A least-privilege declaration of which tools a model may call, with argument bounds, deterministic authorization, side-effect classification, and limits. Excessive agency is prevented chiefly by the tools the manifest omits.
**Confused deputy: ** A program with legitimate authority tricked by a less-privileged party into misusing it. An LLM agent holding the application's credentials is the canonical case.
**Degrade-safe: ** Returning a reduced but still useful response (the general answer without the dangerous specific) instead of refusing entirely.
**Disposition: ** The outcome a policy assigns to a category of request: allowed, disallowed, restricted, escalated, logged. Restricted and escalated are what a two-state allow/refuse policy cannot express.
**Excessive agency: ** Giving a model more capability, autonomy, or permission than the task requires (OWASP LLM06), so a manipulated or mistaken model can cause real harm.
**Grounding check: ** Verifying that an output's factual claims are supported by the cited evidence; unsupported claims are degraded rather than asserted.
**Indirect prompt injection: ** An attack delivered through content the system retrieves (a document, a tool result) rather than the user's message, so a benign request pulls hostile instructions into context.
**Least privilege: ** Granting the minimum capability required for the task. The most reliable agent control because it shrinks the blast radius before any specific call is considered.
**Overblock (false positive): ** A control intervening on input or action that was actually safe. Paid for in invisible user churn.
**Permission-aware retrieval: ** Constraining the retrieval candidate set to what the principal is authorized to see before the vector search runs, enforced at the data layer.
**Policy / mechanism separation: ** Keeping the normative rule (what should happen, versioned and owned) distinct from how it is enforced (classifiers, checks, validators), so each can change independently.
**ROAD: ** The book's framework for placing a control: Risk (the specific harm), Operation (where in the boundary it occurs), Action (what the control does), Detection (how you know it worked or failed).
**Retrieval firewall: ** The set of controls at the retrieval and prompt-assembly operations: permission pre-filter, freshness/authority ranking, sanitization, and wrapping retrieved text as untrusted evidence.
**Safe completion: ** Fulfilling the safe part of a risky request while declining only the specific risky element, with a useful alternative or resource. The opposite of the refusal reflex.
**Safety theater: ** Controls that make a team feel safe while reducing no real risk: disclaimers, generic refusals, and green dashboards that do not affect behavior or the actual failure path.
**Side-effect ladder: ** The classification of actions by reversibility (read, draft, bounded write, irreversible) that determines how much autonomy an agent may have; the most valuable move is climbing down it by making irreversible actions reversible.
**System boundary: ** The full sequence of operations a request passes through (input through escalation), each a place a control can sit. Drawing it honestly reveals the operations that have no control at all.
**The five concerns: ** Safety (harm to people), security (adversary against owner intent), compliance (external rules), reliability (correctness without an adversary), and product policy (discretionary company choices)."Safety" used as a catch-all for all five is the field's most expensive ambiguity.
**Underblock (false negative): ** A control failing to intervene on input or action that was actually harmful. Paid for in rare, concentrated, visible harm.
**Usefulness floor: ** A release-gate threshold capping the overblock rate, held alongside the safety floor so tuning that improves catch rate by refusing legitimate requests fails the gate.
Implementation Checklist
A guardrail system is approaching production-ready when the team can answer yes, with evidence to each of these. Grouped by movement. This is an interrogation, not a list of controls to own, owning every control produces the wall.
Failure model and concerns (Movement I)
- Safety is treated as a set of per-control error rates, not a single permissive-to-restrictive slider.
- Every control has a measured (or measurable) overblock and underblock rate.
- Each guardrail is tagged with which of the five concerns (safety, security, compliance, reliability, product policy) it serves.
- Security-concern controls are deterministic (authorization, allow-lists, validation), not probabilistic classifiers.
- The full system boundary is drawn; operations with no control are identified, not assumed covered.
- Every control answers ROAD: named risk, owning operation, chosen action, defined detection.
Policy and tiering (Movement II)
- Policy is a versioned, owned, approved artifact, separate from the system prompt and the mechanisms.
- Every decision carries its policy version, so any response is traceable to the rule that governed it.
- The policy uses five composable dispositions; restricted and escalated are used where they belong, not collapsed into refuse.
- Guardrail strength scales with risk tier and intent confidence, not with keyword presence.
- Authorization is a deterministic fact lookup that gates tier-2/3 actions before any probabilistic reasoning.
- The system's overall tier is set deliberately and drives which operations get controls.
Control surfaces (Movements III-VI)
- The input gatehouse has distinct lanes (auth, rate, intent, moderation, injection, PII), each emitting an independent signal combined per policy.
- Pre-call detection does not rely solely on the same model that will answer.
- Retrieval filters by permission before search; tenant/ACL/classification/freshness are data-layer predicates.
- Retrieved content and tool results are wrapped as untrusted evidence with no instruction authority.
- Output passes a sieve chain (schema, policy, evidence, leakage, moderation) with revise tried before refuse.
- Refusal is the last resort; risky requests are decomposed and safe-completed; boundaries are explained without leaking policy.
- Tools are granted by a least-privilege capability manifest; calls are authorized deterministically against real facts.
- The agent's maximum autonomous side-effect rung is explicit and reviewed; irreversible actions are gated or made reversible; approval gates are gated by exception, not volume.
Evaluation, monitoring, operations (Movement VII)
- Four eval sets exist (harmful, benign near-boundary, adversarial/red-team, regression); overblocks count as failures.
- Four numbers are reported per control: overblock, underblock, bypass, correct rate.
- The release gate holds a safety floor, a usefulness floor, and a regression floor.
- The adversarial set grows from production and research; a held-out set exists; categories are tested, not just instances.
- Monitoring detects bypass by consequences (outcome anomalies, canaries), not only by control alerts.
- A written bypass runbook exists: detect, contain (the disciplined wall), scope, remediate, structurally fix, postmortem.
- The improvement loop is structural, not a growing pile of reactive keyword patches.
Per product (Movement VIII)
- ROAD has been re-run for this product's risks rather than copying another product's controls.
- The controls that are theater for this product have been identified and declined.
- A human escalation path exists for the cases the system must not decide alone.
- The residual risk is named and owned; no one claims the system is "perfectly safe."
Research and Source Register
Sources grouped by chapter. A source appears under a chapter only if that chapter actually uses it to support a claim.
**Introduction: ** synthetic; draws on the book's own argument and the ROAD framework. No external citations.
Ch. 1, The Two-Sided Failure
- OWASP Top 10 for LLM Applications (2025)
- OWASP LLM01: 2025 Prompt Injection
- NIST AI Risk Management Framework (AI RMF 1.0)
- OpenAI: Safety best practices
- Constitutional AI: Harmlessness from AI Feedback
Ch. 2, Five Words That Get Confused
- NIST AI Risk Management Framework (AI RMF 1.0)
- NIST AI RMF 1.0 (PDF)
- OWASP Top 10 for LLM Applications (2025)
- OWASP Top 10 for LLM Applications (project home)
- OpenAI: Safety best practices
Ch. 3, The System Boundary and the ROAD Framework
- NIST AI Risk Management Framework (AI RMF 1.0)
- OWASP Top 10 for LLM Applications (2025)
- OWASP Top 10 for LLM Applications (project home)
- OpenAI: Safety best practices
- Not what you've signed up for: indirect prompt injection (Greshake et al.)
Ch. 4, From Prose to Policy
- Constitutional AI: Harmlessness from AI Feedback
- NIST AI Risk Management Framework (AI RMF 1.0)
- OpenAI: Safety best practices
- OWASP Top 10 for LLM Applications (2025)
Ch. 5, Risk Tiering and Intent
- NIST AI Risk Management Framework (AI RMF 1.0)
- NIST AI RMF 1.0 (PDF)
- OWASP Top 10 for LLM Applications (2025)
- OpenAI: Safety best practices
- Constitutional AI: Harmlessness from AI Feedback
Ch. 6, The Gatehouse
- OWASP LLM01: 2025 Prompt Injection
- OWASP Prompt Injection Prevention Cheat Sheet
- OpenAI: Safety best practices
- OWASP Top 10 for LLM Applications (2025)
- Rebuff prompt-injection detector (ProtectAI)
Ch. 7, The Retrieval Firewall
- Not what you've signed up for: indirect prompt injection (Greshake et al.)
- OWASP Prompt Injection Prevention Cheat Sheet
- OWASP Top 10 for LLM Applications (2025)
- How Microsoft defends against indirect prompt injection attacks
Ch. 8, Output Controls
- OpenAI: Structured outputs
- Guardrails AI: Validators
- NVIDIA NeMo Guardrails documentation
- OWASP Top 10 for LLM Applications (2025)
- OpenAI: Safety best practices
Ch. 9, Refusal Is the Last Resort
- Constitutional AI: Harmlessness from AI Feedback
- OpenAI: Safety best practices
- NVIDIA NeMo Guardrails documentation
- NIST AI Risk Management Framework (AI RMF 1.0)
- OWASP Top 10 for LLM Applications (2025)
Ch. 10, Tool Guardrails
- OWASP Top 10 for LLM Applications (2025): LLM06 Excessive Agency
- OWASP Top 10 for LLM Applications (project home)
- OWASP Prompt Injection Prevention Cheat Sheet
- NIST AI Risk Management Framework (AI RMF 1.0)
- OpenAI: Safety best practices
Ch. 11, The Side-Effect Ladder
- OWASP Top 10 for LLM Applications (2025): LLM06 Excessive Agency
- OWASP Top 10 for LLM Applications (project home)
- NIST AI Risk Management Framework (AI RMF 1.0)
- NIST AI RMF 1.0 (PDF)
- OpenAI: Safety best practices
Ch. 12, Testing Guardrails Like a Product
- OpenAI: Safety best practices
- OWASP Top 10 for LLM Applications (2025)
- OWASP Prompt Injection Prevention Cheat Sheet
- NIST AI Risk Management Framework (AI RMF 1.0)
- Constitutional AI: Harmlessness from AI Feedback
Ch. 13, Monitoring and the Bypass Incident
- NIST AI Risk Management Framework (AI RMF 1.0)
- NIST AI RMF 1.0 (PDF)
- OWASP Top 10 for LLM Applications (2025)
- OWASP Prompt Injection Prevention Cheat Sheet
- OpenAI: Safety best practices
Ch. 14, Eight Playbooks
- NIST AI Risk Management Framework (AI RMF 1.0)
- OWASP Top 10 for LLM Applications (2025)
- OWASP Prompt Injection Prevention Cheat Sheet
- OpenAI: Safety best practices
Ch. 15, The Guardrail System
- NIST AI Risk Management Framework (AI RMF 1.0)
- NIST AI RMF 1.0 (PDF)
- OWASP Top 10 for LLM Applications (2025)
- OpenAI: Safety best practices
- Constitutional AI: Harmlessness from AI Feedback
