
2025 / Free online book · Field Manuals
The Economics of Inference
Shipping models you can afford to run
Access
Free
Chapters
6
Read time
36 min
The model that wins is the one you can afford to run at scale and explain to a customer. This manual works through quantization, routing, caching, and the plain arithmetic that decides whether an AI feature has a margin or just a demo.
Revenue rarely rewards the biggest model. A working account of latency, cost, and the smaller model that wins the P&L.
This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.
Table of contents
01 The Production Problem Why the work fails after the demo and what must be made explicit first. 6 min 02 The Reference System The smallest system design that can be owned, tested, and improved. 6 min 03 Measurement That Changes Decisions How to measure quality, cost, risk, and user impact without vanity metrics. 6 min 04 Failure Modes and Recovery Where the system breaks, what early signals matter, and how to recover. 6 min 05 Operating Cadence Ownership, review cycles, release gates, and the rituals that keep quality honest. 6 min 06 The First Ninety Days A practical rollout plan from pilot to production without losing control. 6 min
