Name: Prompt Injection Is Not a Joke
Availability: InStock

Appendix A: Back Matter is the operating appendix for the book: glossary, implementation checklist, and source register in one place.

Key Takeaways

The glossary defines the terms the book uses as engineering artifacts: boundary, blast radius, capability manifest, canary, and memory poisoning.

The implementation checklist groups production readiness by threat model, ingestion, tools, exfiltration, persistence, testing, and operations.

The source register ties each chapter to the references it actually uses, rather than adding decorative citations.

The appendix is meant to be copied into review habits: yes with evidence, or not ready.

Read this beside Prompt Injection Is Not a Joke, A Field Guide to Evals, and Devlyn's AI security and red-teaming work when converting the book into a launch review.

Glossary

**Active canary: ** A unique URL planted in a sensitive context so that fetching it (e.g., by an injected exfiltration attempt) pings your server, converting the attacker's exfiltration channel into your detection channel and naming the leaked source.

**Argument binding: ** Supplying a dangerous tool argument (recipient, destination, amount) from trusted application context rather than from model output, so the argument the model cannot set is the argument injection cannot bend. The strongest form of argument validation.

**Attack success rate (ASR): ** The fraction of attacks that achieve their goal despite all defenses. Must be measured end-to-end through the full stack (not model-only) and broken down by impact severity, because a benign-outcome ASR and a catastrophic-outcome ASR are different facts.

**Blast radius: ** The worst outcome achievable if untrusted text fully manipulates the model. Set by your system's authority exercised on the victim's behalf, not by the attacker's privileges. Replaces the unanswerable "is it safe?" with a computable, per-capability question.

**Boundary (security boundary): ** A point where a security decision is enforced such that no attacker-controlled input can flip it; defined by not depending on the attacker's cooperation. In LLM systems, all real boundaries are deterministic checks outside the model (data ACLs, tool gates, egress control, write gates, human approval).

**Canary token: ** A unique, otherwise-meaningless string planted in a sensitive location whose only purpose is to be detected if it appears where it shouldn't. Near-zero false positives, so it can be alerted on aggressively.

**Capability manifest: ** An explicit, version-controlled declaration of every tool the system may call, its arguments, risk level, and controls. Its strongest control is absence: a tool not in the manifest cannot be proposed by any injection.

**Confused deputy: ** A component tricked into misusing its own legitimate authority on an attacker's behalf (Hardy, 1988). Prompt injection is structurally this, with the model as the deputy; the fix is limiting the deputy's authority, not stopping the confusion.

**Defense-in-depth: ** Independent controls of different kinds at different points, so defeating one leaves a qualitatively different obstacle. Catastrophic outcomes are gated by deterministic boundaries, so beating the probabilistic model layer does not beat the system. Not "the same defense stacked."

**Direct prompt injection: ** Injection where the attacker is the user, supplying malicious instructions in the request channel. Usually bounded by the attacker's own permissions.

**Egress control: ** A deterministic, model-independent allowlist of destinations the system may contact, enforced at the network/proxy layer. The highest-return exfiltration defense (also blocks SSRF) because it constrains where output can go rather than detecting what it contains.

**Excessive agency: ** OWASP LLM06: a system with more functionality, permissions, or autonomy than its task warrants, so a manipulated model reaches further than it should. Its three knobs, functionality, permissions, autonomy, are design choices you control.

**Indirect prompt injection: ** Injection where the malicious instruction is embedded in content the system reads from elsewhere (webpage, email, document, retrieved chunk, tool output), authored by someone who is not the user and needs no account. The production problem; its blast radius is the victim's permissions.

**Ingestion trust label: ** A trust level stamped on external content at the moment it enters the system, set by the channel (never the content, which can lie) and carried through the whole pipeline.

**Instruction hierarchy: ** Model training that privileges system/developer instructions over user and tool content. Raises attacker cost; remains probabilistic, opaque, and version-unstable, a layer, never a boundary.

**Least privilege: ** Every component operates with the minimum privileges its task needs (Saltzer & Schroeder, 1975). For a probabilistic deputy, the primary control: every privilege you decline to grant is an attack you don't have to detect.

**Memory poisoning: ** Writing attacker-controlled text into a durable store (memory, summary, index, cache, skill library) so it is re-injected into future sessions. Persistent: a read error pollutes one answer, a poisoned write pollutes every relevant answer indefinitely.

**Memory write gate (adversarial): ** A gate on every durable write that rejects instruction-shaped "facts, " quarantines permission/consent claims, requires a higher trust bar to persist than to read, demands provenance, and forbids scope escalation. Default disposition: do not persist.

**Prompt leaking: ** Extracting the system prompt, developer instructions, or tool schemas. Defended by assuming the prompt will leak and keeping nothing sensitive in it.

**Provenance: ** The recorded origin of a span of content (source_ref, ingestion path, originating turn). The thread that makes an incident a timeline and makes poisoned-source cleanup complete.

**Rate-reducer: ** A probabilistic control (input framing, classifier, model robustness) that lowers how often attacks reach the boundaries and generates signal, but cannot bound impact. Contrast boundary.

**Structured output: ** Schema-constrained model output (JSON schema, native tool-calling). Necessary, kills parsing attacks, gives the validator typed fields, but not sufficient: it constrains form, not meaning.

**TRUST framework: ** The five questions to ask of every span of text: Text source (untrusted by default), Runtime role (injection is data promoted to instruction), User/tenant authority (scope to the task, not the user), System capability (the blast-radius upper bound), Tool/action limit (the deterministic gate that caps it).

**Untrusted interpreter: ** The model, viewed as a component placed inside the perimeter that nonetheless cannot be trusted to make security decisions, because its behavior is steered by the untrusted text it reads. The reason boundaries must sit between it and every capability.

Implementation Checklist

A team's LLM application that reads untrusted text is approaching production-ready when it can answer yes, with evidence to each of these. Grouped by movement.

Threat model and security posture (Movements I-II)

A per-feature threat-model worksheet exists in version control, reviewed like code, listing untrusted inputs, acting authority, capabilities, action limits, unacceptable residual risks, and monitored signals.
Success is framed as "blast radius is bounded, " not "the model resists attacks."
Every asset class (confidential data, integrity, credentials, tools, identity, system prompt, durable state, downstream systems) has a named deterministic control, not only a behavioral helper.
Every HIGH/CRITICAL capability has a deterministic cap; every irreversible CRITICAL action requires human approval, verified by a blast-radius assessment that fails the build otherwise.
The model is treated as an untrusted interpreter; security decisions live outside it.

Input, retrieval, and the supply chain (Movements III-IV)

All external content is stamped with a channel-based trust label at ingestion and the label travels through the pipeline; unknown channels and tool outputs default to most-untrusted.
Untrusted content is framed as data with an unpredictable per-request delimiter and never reaches the system role (enforced by a deterministic assertion).
Injection classifiers feed monitoring and capability downgrade, never act as a lone hard gate.
Retrieval filters by the user's permissions before ranking by relevance (ACL-first), tested with a cross-tenant fixture; ingestion scopes visibility so uploaded/public content can't reach unauthorized users.
Concealment (zero-width/bidi chars, hidden text, HTML comments, image text layers) is stripped to close the human-visible vs model-visible gap.

Tools, exfiltration, and persistence (Movements V-VII)

Read and write tools are separated; the narrowest tool is exposed; the most dangerous capabilities are absent from the manifest, not merely gated.
Every tool call passes a deterministic gate: manifest membership, argument form and meaning policy, cumulative budget, and human approval for high-impact effects.
Dangerous arguments are bound to application facts, not model output, wherever possible.
Secrets are never in the prompt; a server-side broker performs authenticated operations with non-secret tool arguments.
An egress allowlist constrains all outbound requests (and blocks SSRF); untrusted output is never auto-rendered in a context that auto-fetches URLs.
Canary tokens are planted in sensitive contexts (system prompt, sensitive docs/records) and alerted on; data is minimized and masked in context.
Every durable write passes an adversarial memory write gate (reject instruction-shaped, quarantine consent claims, require provenance, forbid scope escalation); a lineage log enables complete poisoned-source cleanup.

Architecture, testing, and operations (Movements VIII-X)

The defense matrix (attack path × prevent/bound/detect) is filled in, with a populated Bound column of deterministic boundaries; paths with only detection are known exposures.
A versioned red-team corpus spans every attack path and channel; it runs in CI with deterministic boundary assertions gating the build and probabilistic outcomes recorded as a tracked rate.
Malicious documents are loaded into a real retrieval index in tests; a tool-call simulation harness exercises the real gate without firing effects.
ASR is measured end-to-end by severity; adaptive red-teaming runs periodically; every real attack becomes a permanent regression fixture.
Boundaries emit telemetry into a monitoring schema; alerts fire on canary hits, egress blocks, gate-denial spikes, and memory-rejection spikes; behavioral baselines and scheduled store scans catch the slow, quiet attacks.
A prompt-injection incident runbook exists (contain → scope → eradicate including derivative cleanup → recover with a new fixture → notify → blameless postmortem), and postmortems root-cause to a missing boundary, never "the model was fooled."
A per-system launch checklist is completed before launch, with the hardest deterministic gate placed at the highest-blast-radius capability.

Research and Source Register

Sources grouped by chapter. A source appears under a chapter only if that chapter actually uses it to support a claim.

**Front matter & Introduction: ** draws on the book's own argument plus the spine sources below; primary citations used in the introduction:

Ch. 1, The Ticket That Tried to Email Itself

Ch. 2, A Prompt Is Not a Security Boundary

Ch. 3, Assets, Trust Boundaries, and the TRUST Framework

Ch. 4, The Confused Deputy, Least Privilege, and Blast Radius

Ch. 5, Talking the Model Out of Its Instructions

Ch. 6: Input Handling: What Classifiers and Boundaries Can and Cannot Do

Ch. 7, The Supply Chain of Untrusted Text

Ch. 8, RAG Is an Attack Surface

Ch. 9, Read Tools, Write Tools, and the Argument Nobody Validated

Ch. 10, Capability Manifests, Tool-Call Gates, and Approval Flows

Ch. 11, The Many Doors Data Leaves By

Ch. 12, Secrets, Minimization, Canaries, and the Limits of Output Filtering

Ch. 13, When the Attack Outlives the Session

Ch. 14, Defense-in-Depth: The Whole Architecture

Ch. 15, Red Teams, Fixtures, and Tests That Load Malicious Documents

Ch. 16, Monitoring, Forensics, and the Injection Incident

Ch. 17, Ten Systems, Ten Threat Models, Ten Launch Checklists