Name: Prompt Injection Is Not a Joke
Availability: InStock

Prompt Injection Is Not a Joke is the operating premise of this book: any LLM system that reads untrusted text needs security boundaries outside the model.

Key Takeaways

Prompt injection becomes an application-security problem as soon as the model can reach tools, memory, workflows, or confidential data.

The recurring motif is to treat untrusted text like executable influence, even though it is not code in the classic sense.

The TRUST framework asks where text came from, what role it plays, whose authority it rides on, what it can reach, and which action limits hold.

This book is defense-first: adversarial examples exist to create fixtures and controls, not to publish payloads.

Read this beside the full AI security book, Security Boundaries for Tool-Using Systems, and Devlyn's AI security and red-teaming work when the front matter becomes a production checklist.

Book promise

A prompt is not a security boundary. The moment an LLM reads text it did not author and can call tools, write memory, or influence a workflow, prompt injection stops being a chat-box parlor trick and becomes an application-security problem with the same gravity as SQL injection, SSRF, or a confused-deputy privilege escalation.

This is a practical, defense-first field manual for builders of LLM applications, RAG systems, agents, and copilots. It is written for people who have already shipped something, a support assistant that reads tickets, a copilot that browses pages, a document assistant that runs OCR over uploads, an agent that can email or update a CRM, and have started to feel the unease of a system that treats every byte it reads as a candidate instruction. It teaches you to design so that when the model is confused, and it will be, the damage is bounded, observable, and recoverable.

This manuscript is not a list of jailbreak prompts, not a fear-mongering essay, and not a promise that one magic detector solves the problem. It is an application-security book that happens to be about LLMs. It assumes you are an authorized builder hardening your own system, and every adversarial example exists so you can write the test that defends you.

The recurring motif

**Treat untrusted text like executable influence. **

It is not executable code in the classic sense. It does not get a stack frame or a syscall. But it steers the component that chooses your outputs, your tool calls, your memory writes, and your actions. A string in a support ticket cannot run rm -rf, but it can convince the thing that can call your delete tool that deleting is what the user wanted. The influence is the payload. Design for influence you cannot prevent, and constrain what influence is able to reach.

The enemy

The belief this book exists to correct:

"Prompt injection is just people typing ignore previous instructions into a chat box. We told the model not to obey that. We're fine."

That framing is comfortable and wrong. It hides the entire indirect attack surface, instructions buried in webpages, emails, PDFs, tickets, calendar invites, code comments, OCR text, retrieved chunks, tool outputs, and memory candidates, written by people who never log into your app and never see your system prompt. It assumes the system prompt is a wall. It is not. It is a suggestion the model usually follows, evaluated by a probabilistic component in the same undifferentiated token stream as the attacker's text. A suggestion is not an access-control decision.

The central thesis

Prompt injection exists because LLM applications blur data and instructions. Security must assume the model can be confused and reduce what confusion can damage.

You will not be able to prove the model immune to manipulation. Stop trying to. The engineering question is not "can we prompt the model to refuse attacks?" It is "what can untrusted text influence, and how do we cap the blast radius when influence succeeds?"

Primary research and standards references

These anchor the book. Individual chapters use their own chapter-specific sources; this is the shared spine.

The TRUST Boundary Framework

One framework recurs through the book. Whenever text enters your system, ask five questions about it. They are the security design tool, not a chapter template, TRUST will not appear as a forced subsection in every chapter, but a mature LLM application can answer all five for any span of text it processes.

**T: Text source. ** Where did this text come from, and is the source trusted? The user typed it, a tenant's document held it, a public webpage served it, a tool returned it, memory stored it. Source determines default trust, and almost nothing should default to trusted.
**R: Runtime role. ** What role is this text playing right now: system instruction, user request, retrieved evidence, tool result, or memory candidate? The same bytes are safe as evidence and dangerous as instruction. Confusing roles is the root of injection.
**U: User/tenant authority. ** On whose authority is this text being processed, and who is allowed to see what it can surface? Identity is not authorization; a request authenticated as Alice must not let attacker-controlled text act with Alice's full permissions on data Alice happens to be able to reach.
**S: System capability. ** What could the model do if this text manipulates it? Read more data, draft a message, call a write tool, persist a memory, trigger a workflow. Capability is the upper bound on damage.
**T: Tool/action limit. ** What external effects are impossible without policy approval outside the model? This is the real boundary: the deterministic gate, the allowlist, the human confirmation, the egress control, the things that hold even when the model is fully convinced.

Movement I: The Joke That Became an Incident

The Ticket That Tried to Email Itself
A Prompt Is Not a Security Boundary

Movement II: The Security Model LLM Apps Need

Assets, Trust Boundaries, and the TRUST Framework
The Confused Deputy, Least Privilege, and Blast Radius

Movement III: Direct Prompt Injection and Jailbreaks

Talking the Model Out of Its Instructions
Input Handling: What Classifiers and Boundaries Can and Cannot Do

Movement IV: Indirect Prompt Injection: The Real Production Problem

The Supply Chain of Untrusted Text
RAG Is an Attack Surface: Ingestion and Retrieval Defenses

Movement V: Tools Turn Injection Into Impact

Read Tools, Write Tools, and the Argument Nobody Validated
Capability Manifests, Tool-Call Gates, and Approval Flows

Movement VI: Data Exfiltration and Prompt Leakage

The Many Doors Data Leaves By
Secrets, Minimization, Canaries, and the Limits of Output Filtering

Movement VII: Memory Poisoning and Persistent Compromise

When the Attack Outlives the Session

Movement VIII: Defense Patterns That Reduce Blast Radius

Defense-in-Depth: The Whole Architecture

Movement IX: Testing, Monitoring, and Incident Response

Red Teams, Fixtures, and Tests That Load Malicious Documents
Monitoring, Forensics, and the Injection Incident

Movement X: Use Case Threat Models

Ten Systems, Ten Threat Models, Ten Launch Checklists

Back matter

Glossary
Implementation Checklist
Research and Source Register

Front Matter: Prompt Injection Is Not a Joke