Name: Memory Systems for Agents
Availability: InStock

> **Working claim:** A memory system is defined as much by what it forgets as by what it keeps. Forgetting is not failure; it is the mechanism that keeps memory true, relevant, small, and safe.

Key Takeaways

Forgetting, Decay, and Conflict Resolution is a chapter about agent memory systems, not a generic AI adoption note.

The operating rule is to treat every memory as a sourced, scoped, revisable claim instead of an ambient fact.

The failure mode to watch is polished output without evidence, owner, cost line, or rollback path.

The useful next step is an artifact a future teammate can replay without folklore.

Agent memory is useful only when every stored claim has source, scope, decay, and deletion rules.

Working claim: A memory system is defined as much by what it forgets as by what it keeps. Forgetting is not failure; it is the mechanism that keeps memory true, relevant, small, and safe. Each memory type decays along a different axis, episodes by volume, facts by re-confirmation, preferences by behavioral contradiction, skills by failure, tasks by status, environment by TTL, and when two memories collide, the system must resolve the conflict explicitly rather than letting recency or luck decide. An agent that cannot forget accumulates a growing pile of stale, contradictory, confidently-wrong beliefs.

Forgetting is a feature

The cultural default treats forgetting as a bug, a limitation to be engineered away now that storage is cheap and recall is a vector search. This is exactly backwards for agent memory. A system that never forgets does not become wiser; it becomes a hoard of claims that were true once, are contradicted by each other, and grow more dangerous as they age, because the agent recalls them with the same confidence it had the day they were written. The most reliable memory systems forget aggressively and deliberately, and they treat forgetting as a first-class operation with its own policies, not as an afterthought bolted on when the store gets too big.

There is also a human-memory analogy worth taking seriously without over-reading it. MemoryBank explicitly modeled forgetting on the Ebbinghaus curve: memory strength decays over time and is refreshed on access, so frequently-recalled memories stay strong while unused ones fade. The principle that transfers is not the specific curve but the shape: memories should not all persist at full strength forever; their influence should reflect how recently and how often they have been re-confirmed. A memory recalled and re-verified ten times this month should outweigh one written once a year ago and never touched since. Forgetting, in this framing, is just the low tail of a decay process that also keeps live memories strong.

Whiteboard-style technical sketch infographic for Forgetting, Decay, and Conflict Resolution. — Different memory types decay on different axes, lowering influence before policy or staleness forces deletion.

Six decay axes, one per type

Chapter 2 asserted that the memory types decay along incompatible axes; this is where we make each one operational. The single biggest error in memory operations is applying one decay rule, usually a flat TTL, to everything, because a flat TTL is wrong for almost every type.

Episodic memory decays by volume and age, but its derivatives persist. Raw episodes are the most truthful and the most voluminous memory; you cannot keep every action forever. The policy: age out or down-sample raw episodes after their derived memories and summaries are settled and verified, while keeping the derived facts, the summaries, and the audit log. The episode is the receipt; you keep it long enough to verify the derived facts depend on real evidence, then you may shred the receipt, except where retention is legally required or where the episode is itself the audit record.

-- Episodic aging: drop raw episodes older than the retention window whose
-- derived memories are settled, EXCEPT those flagged for legal hold or audit.
DELETE FROM episodic_memory
WHERE occurred_at < now() - interval '90 days'
 AND owner_scope NOT IN (SELECT scope FROM legal_holds)
 AND episode_id NOT IN (SELECT episode_id FROM unsettled_derivations)
 AND kind <> 'audit';

Semantic facts decay by re-confirmation. A fact does not expire on a clock so much as it grows uncertain the longer it goes unconfirmed. The mechanism is a confidence that falls with staleness and rises when the fact is re-observed. This is the MemoryBank shape applied to facts:

def effective_confidence(m: Memory, now: datetime) -> float:
 """Stored confidence decayed by time since last confirmation."""
 days = (now - (m.last_confirmed_at or m.created_at)).days
 decay = math.exp(-days / m.half_life_days) # category-specific half-life
 return m.confidence * decay

def on_reconfirmation(m: Memory, ctx: MemoryContext):
 """A fresh observation of the same fact refreshes it - no new row."""
 m.last_confirmed_at = now()
 m.confidence = min(1.0, m.confidence + RECONFIRM_BOOST)
 ctx.update(m)

The half-life is category-specific."User's birthday" has an effectively infinite half-life (it does not change)."User's current project" has a short one (it changes every few months)."User's timezone" is in between. Setting half-lives per category is how you encode, as data, how fast each kind of fact goes stale, and it means recall (Chapter 5) using effective_confidence automatically prefers fresh facts over stale ones without any special-casing.

Preferences decay by behavioral contradiction. A preference is undermined not by time but by the user repeatedly doing the opposite of it. A "prefers mornings" default should weaken every time the user reschedules to the afternoon, regardless of how long ago it was set. This is the decay axis the 7 a. m. incident most needed: even without confirmation at write time, accumulating contradictory behavior should have eroded the false preference until it stopped driving decisions.

def on_behavior(pref: Memory, observed_action: Action, ctx: MemoryContext):
 if observed_action.contradicts(pref):
 pref.contradiction_count += 1
 pref.confidence *= CONTRADICTION_DECAY # weaken on each contradiction
 if pref.contradiction_count >= CONTRADICTION_LIMIT:
 ctx.revoke(pref, reason="repeatedly contradicted by behavior")
 elif observed_action.confirms(pref):
 pref.confidence = min(1.0, pref.confidence + CONFIRM_BOOST)
 ctx.update(pref)

Skills decay by failure, not time (Chapter 7): a skill unused for a month is idle, not stale, but a skill whose recent failure rate climbs is drifting and gets demoted.Tasks decay by status transition (Chapter 8): they live until done or abandoned, then stop being recalled as active, with a sweep for zombies.Environmental memory decays by short TTL (Chapter 2): it is stale almost immediately and must be re-verified before consequential action. Six types, six axes. A single decay policy cannot express them, which is the final operational argument for the taxonomy.

The forgetting curve as a recall input, not just a cleanup job

A subtlety teams miss: decay should influence recall, not just deletion (see MemGPT for a treatment of tiered memory where recency and access patterns shape what stays active). Treating forgetting as a periodic DELETE job is the weak form. The strong form folds effective_confidence directly into the recall score (Chapter 5), so a decaying memory's influence falls smoothly long before it is ever deleted. A six-month-old unconfirmed fact does not need to be deleted to stop dominating decisions; it needs its effective confidence to have decayed enough that fresher, re-confirmed memories outrank it. Deletion is for memories that have decayed past usefulness or that policy requires gone; graceful loss of influence is the everyday mechanism, and it is gentler and safer than a binary keep-or-delete, because a memory that is probably-stale-but-maybe-useful can sit at low influence rather than being prematurely destroyed.

Conflict resolution: never let recency or luck decide

Conflicts are inevitable: the user changes timezones, an agent observes the opposite of a stored fact, two sources assert incompatible things. The cardinal rule is the one from Chapter 4, now elaborated: never silently overwrite, and never let last-write-wins decide a conflict by accident. Silent overwrite destroys history and makes a wrong correction unrecoverable; last-write-wins lets a stale or low-trust write clobber a fresh, correct one. Conflict resolution must be explicit, evidence-weighted, and reversible.

There are three structurally different kinds of conflict, and each resolves differently:

Temporal conflict: the fact changed over time. The user was in Pacific, now they are in Eastern. Resolution is supersession: write the new fact, set the old one's superseded_by, keep the chain. Both remain in the record; recall surfaces only the current one (the superseded_by IS NULL filter from Chapter 5). This preserves "the user moved on March 4, " which is itself useful history.

-- Supersede a fact: the new memory replaces the old, chain preserved, reversible.
BEGIN;
INSERT INTO semantic_memory (memory_id, subject, claim, category, confidence,
 created_at, last_confirmed_at, owner_scope, consent_basis)
VALUES (:new_id,:subject,:new_claim,:category,:confidence,
 now(), now(),:owner_scope,:consent_basis);
UPDATE semantic_memory SET superseded_by =:new_id WHERE memory_id =:old_id;
INSERT INTO memory_audit (op, memory_id, actor, detail)
VALUES ('supersede',:old_id,:actor,
 jsonb_build_object('superseded_by',:new_id, 'reason',:reason));
COMMIT;

Source conflict: two sources assert incompatible facts at the same time (the multi-agent case from Chapter 9, but it also happens single-agent: the user said one thing, a document said another). Resolution weighs source trust and directness: a direct user statement beats an inference; a verified observation beats a guess; for verifiable consequential facts, go check the world rather than arbitrate. Crucially, an unresolved source conflict is itself a memory, "there are conflicting claims about X", and the agent should often surface the conflict to the user ("I have you in two timezones, which is right?") rather than silently pick a side. A conflict the agent resolves by guessing is a future wrong action; a conflict it surfaces is a thirty-second clarification.

Memory-versus-live-instruction conflict: the stored default versus what the user just said (Chapter 5). The live instruction wins; the stored default yields. This is not really resolved in the store at all; it is resolved at recall time by the yields_to_live_intent semantics. It is listed here because teams confuse it with the other two and try to "fix the memory" when the right fix is to let recall defer to the present.

The conflict resolution matrix

Conflict type	Signal	Resolution	What persists
Temporal (fact changed)	New observation contradicts old, later in time	Supersede: new replaces old, chain kept	Both; recall sees current only
Source (concurrent disagreement)	Two live claims, incompatible	Weigh trust + directness; verify world if verifiable; else surface to user	Both as conflict, or world-truth
Memory vs. live instruction	Stored default vs. explicit current request	Live instruction wins; default yields (recall-time)	Default unchanged; just not applied
Self-reinforcement (rumination)	Agent re-reading its own intra-task writes	Reject as evidence; require external confirmation	Nothing new (hypothesis stays working-mem)

The fourth row is the agentic special case from Chapter 3. An agent that writes a hypothesis and reads it back can create a false "confirmation" from self-reference. The resolution is to not count the agent's own un-externally-verified prior write as corroborating evidence, a memory's observation_count should not increment from the agent re-stating its own guess. Self-agreement is not corroboration, and a conflict-resolution layer that treats it as such will let an agent talk itself into anything (see Reflexion on using externally-verified trial outcomes, not self-reference, as the signal for self-improvement; and Generative Agents on grounding reflection in retrieved observation streams).

Deletion versus revocation versus supersession

Three operations are routinely conflated, and the distinction is operationally vital, it determines what the agent recalls, what the audit log retains, and whether you can recover from a mistake.

Supersession replaces a memory with a corrected version, keeping the chain. The old version is not recalled (it is superseded) but is recoverable and visible in the audit trail. Use for temporal change and corrections.
Revocation (soft delete) marks a memory as no longer valid, revoked_at set, so recall excludes it (the revoked_at IS NULL filter), but the row and its provenance remain for audit. Use when a memory is wrong and should stop influencing behavior but its history matters.
Hard deletion physically removes the memory, reserved for honoring erasure obligations (Chapter 11). Even then, the audit log records that a deletion occurred: "memory X for user Y was hard-deleted on date Z", because the fact of deletion is itself a record you must keep.

def forget(memory_id, mode, ctx, reason):
 if mode == "supersede":
 ctx.set_superseded_by(memory_id, reason.new_id) # recoverable, chained
 elif mode == "revoke":
 ctx.set_revoked_at(memory_id, now(), reason) # soft: excluded from recall
 elif mode == "hard_delete":
 ctx.audit("hard_delete", memory_id, reason) # record the deletion FIRST
 ctx.physically_delete(memory_id) # then erase the data
 ctx.cascade_to_derived(memory_id) # invalidate summaries/reflections
 # In every mode: recall must immediately stop surfacing the memory.

The ordering in hard_delete is deliberate: audit before erase, so that even if the deletion fails partway, you have a record that it was attempted, and you never have erased data with no record that it existed. And cascade_to_derived is the Chapter 6 invalidation: deleting a source memory must invalidate the summaries and reflections built on it, or the deleted fact lives on, laundered, inside a derivation that no longer cites a source you can see. Deletion that does not cascade is not deletion; it is hiding the original while keeping the copy.

What this chapter sets up

Forgetting is the half of memory engineering that the "just add memory" reflex ignores entirely, and it is what keeps the other half honest: six decay axes that match the six memory types, decay folded into recall as graceful loss of influence rather than binary deletion, conflict resolution that supersedes rather than overwrites and verifies rather than guesses, the refusal to let an agent's self-reinforcement count as corroboration, and the three-way distinction between supersession, revocation, and deletion with cascade to derived memory.

Two of those operations, revocation and hard deletion, are not only quality mechanisms; they are legal and ethical obligations. A user has a right to see what an agent remembers about them, to correct it, and to demand it be forgotten, and the system has duties about consent, sensitivity, and isolation that go beyond making the agent work well. That is governance, and it is the subject of the next chapter, the one where bad memory stops being a quality bug and becomes a liability with your name on it.

Internal map

For the larger argument, keep this chapter connected to memory systems for agents, Memory Systems for Agents, Agents That Actually Work, and agentic workflows.