Name: Hallucination, Mechanically
Availability: InStock

The hallucination checklist turns the book into implementation gates: terms, controls, and source references a team can carry into design review.

Key Takeaways

The glossary standardizes the vocabulary: atomic claim, citation laundering, evidence-bound citation, faithfulness, factuality, and abstention.

The checklist turns each movement into implementation gates for naming, retrieval, citation, summarization, agents, detection, and operations.

The source register lists only references used by the book, grouped by chapter.

The appendix is the handoff artifact for turning the book into an internal review rubric.

Read this with domain playbooks, the CLAIM Framework, and llm evaluation.

Glossary

**Abstention: ** The system's decision to decline to answer (or to ask, or escalate) when the evidence chain for the requested claim cannot be completed. A calibrated refusal; often the highest-value output. Tuned by a threshold on a support signal against an asymmetric cost matrix.

**Atomic claim: ** A minimal, independently checkable factual statement extracted from an answer. The unit of verification; a paragraph is a mixture of supported and unsupported atomic claims (FActScore).

**Authority (CLAIM "A"): ** The check that a linked source is allowed for the user, current, and authoritative for the claim type. The bridge from faithfulness back to factuality; a real but stale source still produces a wrong answer.

**Calibration: ** The property that claims made with stated confidence p are correct a fraction p of the time. Degraded by alignment tuning; must be measured (ECE, reliability diagrams) before a confidence is thresholded.

**Citation laundering: ** Attaching a real, existing source to a claim the source does not support. Passes an existence check; caught only by an entailment check (span ⊨ claim). The dominant citation failure in well-built systems.

**CLAIM: ** The book's framework: Claim (decompose), Link (bind to a span), Authority (allowed/current/authoritative), Inference (entailment, not similarity), Mitigation (answer/revise/ask/abstain/escalate).

**Completion guard: ** An agent control that blocks or rewrites any "I did X" claim not backed by a SUCCESS action-log entry. The agent analogue of evidence-bound citation.

**Context precision / recall (RAGAS): ** Precision: fraction of retrieved passages that are relevant. Recall: fraction of needed spans that were retrieved. Measured upstream of generation to separate retrieval failures from generation failures.

**Entailment: ** The relation "if the span is true, the claim must be true." Distinct from similarity ("same topic"). The correct verification signal; similarity is not.

**Evidence-bound citation: ** A citation derived from a verified supporting span (evidence → claim → citation), not generated alongside the claim. The structural fix that makes a citation proof rather than decoration.

**Extrinsic hallucination: ** Output that adds content the source neither supports nor contradicts. Invisible to contradiction-only checks; caught only by a coverage check requiring every claim to be entailed.

**Faithfulness: ** Consistency of the output with the provided input (context/instruction). Checkable with inputs you already have; the primary optimization target for grounded systems. Contrast factuality.

**Factuality: ** Consistency of the output with the real world. Requires an external knowledge source; achieved indirectly by being faithful to a governed, authoritative, current corpus.

**Intrinsic hallucination: ** Output that contradicts the provided source. Catchable by a contradiction/NLI check against the source.

**Misplaced certainty: ** A guess delivered in the linguistic register of an established fact. A calibration failure; the surface signature that makes all other hallucinations dangerous.

**Risk-coverage curve: ** The trade between coverage (fraction of inputs answered) and risk (error rate on answered inputs) as the abstention threshold moves. The operating point is chosen from this curve and the cost matrix (selective prediction).

**Selective prediction: ** A predictor paired with a gating function that decides per input whether to predict or abstain. The formal frame for abstention.

**Self-consistency: ** Sampling a prompt multiple times and flagging claims the samples disagree on; divergence proxies the model's uncertainty (SelfCheckGPT). Blind to confident, consistent errors; must operate over meanings, not strings (semantic uncertainty).

**Stale fact: ** A claim that was true once and is not now. The model's parametric memory is undated; defended by data-layer currency filtering and the Authority check.

**Unsupported synthesis (over-reach): ** A claim inferred beyond what the evidence supports. The failure retrieval cannot fix; caught by claim-to-span entailment, not similarity.

**Verification loop: ** Generate → verify (Chapter 10 pipeline) → revise/abstain/escalate. The densest-fit intervention, because it acts on the claim regardless of why it was unsupported; bounded by the verifier's own reliability.

Implementation Checklist

A team's anti-hallucination system is approaching production-ready when it can answer yes, with evidence to each of these. Grouped by movement.

Naming and framing (Movements I-II)

Failures are localized to a station (query / retrieval / synthesis / attribution / verification / corpus), not filed as "hallucination."
Outputs are verified at the atomic-claim level, not the answer level.
The team distinguishes intrinsic (contradiction) from extrinsic (added) hallucination and knows a contradiction-only check is blind to the latter.
"Be truthful" and "lower temperature" are understood as near-inert mitigations, not relied on.
Any confidence score thresholded for trust has had its calibration measured (ECE) and recalibrated; raw model confidence is never the gate.

Retrieval, citation, summarization (Movements III-IV)

Context recall and precision are measured upstream of generation, so retrieval failures are distinguished from generation failures.
Retrieval is hybrid (lexical + dense); reranking is in place; sources carry validity metadata for currency filtering.
Citations are evidence-bound, derived from a verified span, showing verbatim source text, not emitted by the model alongside the claim.
Citation precision and recall are tracked separately.
Summaries are verified with claim-to-span faithfulness and a separate numeric gate; named output slots license emptiness explicitly; decision/action modality is checked.

Agents (Movement V)

Tool results flow through a schema with a status field set by the runtime, never the model.
A completion guard blocks "I did X" claims without a SUCCESS log entry.
The control loop observes each action's result before reasoning about the next.
Irreversible actions require human confirmation before execution.

Detection and mitigation (Movements VI-VII)

The verifier is built as four measurable stages (extract / link / classify / decide), each tuned separately; "supported" is treated as a calibrated signal, not proof.
Classification uses entailment, not similarity, at claim-to-span granularity.
The generator never judges itself; any LLM judge must quote a real span that is then verified programmatically.
Each intervention is matched to the failure mode it addresses; verification loops and abstention carry the load.
Abstention has a threshold tuned against an explicit, asymmetric cost matrix; refusals are specific and actionable, not generic.

Evaluation and production (Movement VIII)

The golden set is claim-level and span-grounded, includes answerable = false items to measure abstention, and ties items to a corpus version.
An annotation guide enforces judge support, not plausibility; inter-annotator kappa is measured.
Metrics are reproducible queries, reported with confidence intervals and sliced worst-slice-first; abstention metrics are reported in pairs.
The eval is wired into CI as a gate; every production hallucination becomes a regression item.
Production logs at the claim grain; the four rates (unsupported-claim, abstention, citation precision, user-correction) are monitored by slice.
Corpus drift is a first-class, separately-monitored failure; supersession invalidates caches and re-runs affected evals.
A contain-first incident runbook exists; every incident is treated as a detection failure to fix, and the system's residual hallucination rate is known and bounded (see Playbooks by Domain for the per-domain runbook templates).

Research and Source Register

Sources grouped by chapter. A source appears under a chapter only if that chapter actually uses it to support a claim.

**Front matter: ** lists the book's shared spine; no per-claim citation.

**Introduction: ** synthetic; draws on the book's own argument. No external citations.

Ch. 1, The Confident Wrong Answer

Ch. 2, A Working Taxonomy of Hallucination

Ch. 3, The CLAIM Framework

Ch. 4, Fluency Is Not Evidence

Ch. 5, What Models Know About What They Know

Ch. 6, When Retrieval Fails Before Generation Begins

Ch. 7, A Citation Is Not Proof

Ch. 8, The Compression Press

Ch. 9, Hallucinated Actions

Ch. 10, Claim Extraction and Source-Span Verification

Ch. 11, Self-Consistency and the Limits of the Judge

Ch. 12, Interventions and Their Limits

Ch. 13, Teaching a System to Say "I Don't Know"

Ch. 14, Measuring Unsupported Claims

Ch. 15, Operating Against Hallucination in Production

Ch. 16, Playbooks by Domain

Appendix A: Back Matter