Front Matter: Prompt Injection Is Not a Joke
Security for AI Systems That Read Untrusted Text
Prompt Injection Is Not a Joke is the operating premise of this book: any LLM system that reads untrusted text needs security boundaries outside the model.
Key Takeaways
- Prompt injection becomes an application-security problem as soon as the model can reach tools, memory, workflows, or confidential data.
- The recurring motif is to treat untrusted text like executable influence, even though it is not code in the classic sense.
- The TRUST framework asks where text came from, what role it plays, whose authority it rides on, what it can reach, and which action limits hold.
- This book is defense-first: adversarial examples exist to create fixtures and controls, not to publish payloads.
Read this beside the full AI security book, Security Boundaries for Tool-Using Systems, and Devlyn's AI security and red-teaming work when the front matter becomes a production checklist.
Book promise
A prompt is not a security boundary. The moment an LLM reads text it did not author and can call tools, write memory, or influence a workflow, prompt injection stops being a chat-box parlor trick and becomes an application-security problem with the same gravity as SQL injection, SSRF, or a confused-deputy privilege escalation.
This is a practical, defense-first field manual for builders of LLM applications, RAG systems, agents, and copilots. It is written for people who have already shipped something, a support assistant that reads tickets, a copilot that browses pages, a document assistant that runs OCR over uploads, an agent that can email or update a CRM, and have started to feel the unease of a system that treats every byte it reads as a candidate instruction. It teaches you to design so that when the model is confused, and it will be, the damage is bounded, observable, and recoverable.
This manuscript is not a list of jailbreak prompts, not a fear-mongering essay, and not a promise that one magic detector solves the problem. It is an application-security book that happens to be about LLMs. It assumes you are an authorized builder hardening your own system, and every adversarial example exists so you can write the test that defends you.
The recurring motif
**Treat untrusted text like executable influence. **
It is not executable code in the classic sense. It does not get a stack frame or a syscall. But it steers the component that chooses your outputs, your tool calls, your memory writes, and your actions. A string in a support ticket cannot run rm -rf, but it can convince the thing that can call your delete tool that deleting is what the user wanted. The influence is the payload. Design for influence you cannot prevent, and constrain what influence is able to reach.
The enemy
The belief this book exists to correct:
"Prompt injection is just people typing ignore previous instructions into a chat box. We told the model not to obey that. We're fine."
That framing is comfortable and wrong. It hides the entire indirect attack surface, instructions buried in webpages, emails, PDFs, tickets, calendar invites, code comments, OCR text, retrieved chunks, tool outputs, and memory candidates, written by people who never log into your app and never see your system prompt. It assumes the system prompt is a wall. It is not. It is a suggestion the model usually follows, evaluated by a probabilistic component in the same undifferentiated token stream as the attacker's text. A suggestion is not an access-control decision.
The central thesis
Prompt injection exists because LLM applications blur data and instructions. Security must assume the model can be confused and reduce what confusion can damage.
You will not be able to prove the model immune to manipulation. Stop trying to. The engineering question is not "can we prompt the model to refuse attacks?" It is "what can untrusted text influence, and how do we cap the blast radius when influence succeeds?"
Primary research and standards references
These anchor the book. Individual chapters use their own chapter-specific sources; this is the shared spine.
- Ignore Previous Prompt: Attack Techniques For Language Models (PromptInject)
- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- OWASP Top 10 for LLM Applications & Generative AI (2025)
- OWASP LLM01: Prompt Injection
- OWASP LLM Prompt Injection Prevention Cheat Sheet
- NIST AI Risk Management Framework
- Microsoft: How Microsoft defends against indirect prompt injection attacks
- OpenAI: Prompt injections (explainer)
- Simon Willison: Prompt injection writing (series index)
The TRUST Boundary Framework
One framework recurs through the book. Whenever text enters your system, ask five questions about it. They are the security design tool, not a chapter template, TRUST will not appear as a forced subsection in every chapter, but a mature LLM application can answer all five for any span of text it processes.
- **T: Text source. ** Where did this text come from, and is the source trusted? The user typed it, a tenant's document held it, a public webpage served it, a tool returned it, memory stored it. Source determines default trust, and almost nothing should default to trusted.
- **R: Runtime role. ** What role is this text playing right now: system instruction, user request, retrieved evidence, tool result, or memory candidate? The same bytes are safe as evidence and dangerous as instruction. Confusing roles is the root of injection.
- **U: User/tenant authority. ** On whose authority is this text being processed, and who is allowed to see what it can surface? Identity is not authorization; a request authenticated as Alice must not let attacker-controlled text act with Alice's full permissions on data Alice happens to be able to reach.
- **S: System capability. ** What could the model do if this text manipulates it? Read more data, draft a message, call a write tool, persist a memory, trigger a workflow. Capability is the upper bound on damage.
- **T: Tool/action limit. ** What external effects are impossible without policy approval outside the model? This is the real boundary: the deterministic gate, the allowlist, the human confirmation, the egress control, the things that hold even when the model is fully convinced.
Table of contents
Movement I: The Joke That Became an Incident
- The Ticket That Tried to Email Itself
- A Prompt Is Not a Security Boundary
Movement II: The Security Model LLM Apps Need
- Assets, Trust Boundaries, and the TRUST Framework
- The Confused Deputy, Least Privilege, and Blast Radius
Movement III: Direct Prompt Injection and Jailbreaks
- Talking the Model Out of Its Instructions
- Input Handling: What Classifiers and Boundaries Can and Cannot Do
Movement IV: Indirect Prompt Injection: The Real Production Problem
- The Supply Chain of Untrusted Text
- RAG Is an Attack Surface: Ingestion and Retrieval Defenses
Movement V: Tools Turn Injection Into Impact
- Read Tools, Write Tools, and the Argument Nobody Validated
- Capability Manifests, Tool-Call Gates, and Approval Flows
Movement VI: Data Exfiltration and Prompt Leakage
- The Many Doors Data Leaves By
- Secrets, Minimization, Canaries, and the Limits of Output Filtering
Movement VII: Memory Poisoning and Persistent Compromise
- When the Attack Outlives the Session
Movement VIII: Defense Patterns That Reduce Blast Radius
- Defense-in-Depth: The Whole Architecture
Movement IX: Testing, Monitoring, and Incident Response
- Red Teams, Fixtures, and Tests That Load Malicious Documents
- Monitoring, Forensics, and the Injection Incident
Movement X: Use Case Threat Models
- Ten Systems, Ten Threat Models, Ten Launch Checklists
Back matter
- Glossary
- Implementation Checklist
- Research and Source Register
