AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
All books
Long Context Is Not Memory cover
2025 / Free online book · Technical Deep Dives

Long Context Is Not Memory

Why Bigger Windows Don't Remember You

Access
Free
Chapters
14
Read time
149 min

A bigger context window feels like a memory upgrade and is not one. This deep dive separates three things people conflate (context, retrieval, and state) and shows what each can and cannot be asked to do in a system that has to remember across sessions.

Stuffing the window is a tactic, not an architecture. What retrieval, state, and context each owe a serious system.

This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.

Table of contents
FM Front Matter: Long Context Is Not Memory Why Bigger Windows Don't Remember You 4 min INT Introduction: The Sentence in the Middle A team I will call the policy team, the details are composited from several real projects, but the shape is exact, had a problem that sounded like a solved problem. They ran compliance support for a mid-sized insurer. 8 min 01 The Million-Token Mirage A large context window is a bigger desk, not an archive. It increases how much a model can be shown at once. It does nothing, by itself, to decide what should be kept, trusted, updated, permissioned, or recalled tomorrow. 9 min 02 Six Words That Are Not Synonyms > **Working claim:** Most long-context failures are a quiet slippage between two words that get treated as one. 12 min 03 The Desk and the Archive > **Working claim:** The first honest long-context decision is not "what fits?" but "what belongs on the desk at all?" A prompt is a curated working set assembled for one task, not a dumping ground for everything that might be relevant. 10 min 04 Tokens, Windows, and the Shape of Attention > **Working claim:** You cannot reason about long-context behavior without a working model of three things: what a token is, what the window actually bounds, and why attention is a finite resource spread unevenly across a sequence. 9 min 05 Lost in the Middle > **Working claim:** Where you place information in a long prompt changes whether the model uses it. Relevant content at the beginning and end of a context is used reliably; the same content buried in the middle is used poorly. 8 min 06 Why Passing Needle-in-a-Haystack Is Not Enough > **Working claim:** The most-cited long-context benchmark, find one planted sentence in a huge document, measures the easiest thing a long context can do and predicts almost nothing about the hard things. 8 min 07 The Operational Bill: Cost, Latency, Caching, and Pollution > **Working claim:** A large context is not a free capability you switch on; it is a recurring operational cost paid on every request in three currencies: money, latency, and accuracy. 9 min 08 A Working Taxonomy of Memory > **Working claim:** "Memory" is not one thing, and the word's vagueness is itself a source of bugs. A production system has at least ten distinct kinds of state, each with a different lifetime, owner, trust level, and governance requirement. 11 min 09 The Memory Write Gate > **Working claim:** The most dangerous moment in a memory system is the *write*. Reading the wrong memory produces a bad answer; writing the wrong memory produces bad answers indefinitely, for everyone, until someone finds and deletes it. 9 min 10 Reading Memory: Recency, Relevance, and Policy > **Working claim:** Recall is not a lookup; it is a ranked, filtered, policy-gated query that decides which durable facts deserve a place on the desk for *this* task. 9 min 11 The Context Assembler > **Working claim:** A production prompt should be *assembled*, not concatenated. The assembler is a real component with a budget, a priority order, and a policy, it decides what goes on the desk, in what order, and what gets cut when the budget is exceeded. 8 min 12 Conflicts, Recency, and Knowing Which Tool to Reach For > **Working claim:** Two hard problems sit at the center of context engineering, and both are usually solved by accident, badly. The first is *conflict*: when the user, a document, and a stored memory disagree, the system must not silently pick one. 10 min 13 Measuring What the System Actually Remembers > **Working claim:** A long-context or memory system that cannot be measured cannot be trusted, and demos systematically overestimate reliability. 9 min 14 Operating Memory in Production > **Working claim:** A memory system is not shipped when it works; it is shipped when it is *observable, governed, and recoverable*, when you can see what it remembers, prove you can delete it, and respond to the incident where it acts on a false belief. 9 min A Appendix A: Back Matter Glossary, implementation checklist, and source register for the book. 7 min