AN Alpesh Nakrani

Blog Books Praise About Work with me →

2026 / Free online book · Points of View

In Defense of Small Models

Points of View, Volume VI

Start reading View chapters

Access

Free

Chapters

11

Read time

113 min

Alpesh Nakrani

CRO at Devlyn · former CTO & COO

The frontier gets the headlines; the small model gets the margin. Why most production value lives below the state of the art.

This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.

Table of contents

INT Introduction: The Demo Was a Lie Why the largest model that wins the demo so often loses the production it was supposed to win, and what this book asks you to measure instead. 9 min 01 What Small Actually Means Smallness is not a parameter count; it is a property of cost, latency, memory, and control, and the families that have it look nothing alike. 9 min 02 Capability Waste in Production AI You are paying for intelligence your tasks never use; here is how to measure the waste before you decide what to do about it. 9 min 03 Making the Task Smaller The highest-use move is not picking a smaller model but shrinking the problem until a smaller model is obviously enough. 9 min 04 The Workhorse Tasks: Classify, Extract, Route, Search The bulk of production AI is four boring jobs, and a small model or no model at all does each of them better than the frontier. 9 min 05 Adapting Small Models: Fine-Tuning and Distillation How to teach a small model your domain cheaply, and how to compress a strong teacher into a student that ships. 9 min 06 Quantization and the Cost of Compression How to shrink a model's footprint and bill without quietly shrinking its quality, and how to prove you did not. 8 min 07 Latency as a Product Feature Speed is not a constraint you tolerate; it is something users feel, and small models let you design it on purpose. 9 min 08 Privacy, Local Inference, and Data Residency Sometimes the constraint that decides the whole architecture is not cost or speed but where the data is allowed to go. 8 min 09 Routing: Small First, Large When Needed The mature architecture is not one model but a cascade that handles the easy majority cheaply and escalates only what earns it. 9 min 10 Proving Good Enough With Evals You cannot defend choosing a small model without measuring it, and measuring it well is a discipline most teams skip. 9 min 11 When Small Models Are Wrong The honest chapter: where small models fail, where the compression reflex becomes its own mistake, and how to know the difference. 8 min END Conclusion: A Sign of Maturity, Not a Lack of Ambition Building a right-sized model portfolio, the production checklist, and why choosing the smallest sufficient system is the senior move. 8 min