Front Matter: Fine-Tune, or Don't
A Practical Decision Process for Customizing AI Models
Key Takeaways
- Fine-tuning is a behavior intervention, not a way to store current facts.
- TRAIN gates every run through task stability, required knowledge, examples, impact, and evaluation.
- The book treats fine-tuning as surgery: diagnose, prepare, monitor, and be ready to roll back.
Book promise
Fine-tuning changes model behavior. It does not replace retrieval, tools, policy, evaluation, or product judgment.
This is a decision-first field guide. It teaches builders when fine-tuning is the right intervention, when prompting, retrieval, structured outputs, tools, routing, or caching are better, and how to fine-tune without laundering bad data into model behavior. It is written for engineers who have already shipped something with a foundation model and have watched a team reach for fine-tuning as a reflex: the support bot that "needs to know the product," the extraction pipeline that "fails sometimes," the assistant that is "too verbose," the workload that "must be cheaper." Each of those sentences names a symptom. None of them names a diagnosis. This book is about the difference. It sits beside AI-Native means the machine does the job and evals that predict production, not vanity in the larger site argument.
This manuscript is not a short brief, not a topic outline, and not a marketing summary. It is designed for AI engineers, product engineers, MLOps engineers, ML engineers, backend engineers, and technical founders who need a production decision process for customizing models: one they can carry into an architecture review and defend with evidence.
The recurring motif
Fine-tuning is surgery, not seasoning.
Seasoning is what you reach for casually, taste, and adjust. Surgery is what you reach for after a diagnosis, with consent, preparation, a plan for what could go wrong, and monitoring during recovery. Fine-tuning is the second kind of intervention. It changes the body of the system, the weights themselves, and the change is not trivially reversible at inference time the way a prompt edit is. Treating training like seasoning is the single most expensive mistake in this field, and most of this book is an elaboration of why.
The enemy
The belief this book exists to correct:
"We have a model problem, so we should fine-tune."
Fine-tuning is genuinely powerful. It can stabilize output format, encode repeated task behavior, capture house style and domain phrasing, sharpen classification and extraction patterns, instill tool-use conventions, shorten prompts for high-volume workflows, and specialize a small model so it is cheaper and faster than a large one. It is usually the wrong way to add current facts, enforce permissions, track business state, follow rapidly changing policy, or hold private knowledge that must be deletable on request. Most teams that "need to fine-tune" need one of those other things instead, and the ones that genuinely need a fine-tune still need everything else around it.
Core thesis, stated plainly
Behavior lives in weights. Facts live in sources. State lives in databases. Permissions live in policy. Fine-tuning edits the first box. It does nothing for the other three, and it can quietly make them harder to operate. Every chapter returns to this split.
Primary research references
These anchor the book. Individual chapters use their own chapter-specific sources; this is the shared spine.
- OpenAI: Model optimization guide
- OpenAI: Supervised fine-tuning
- OpenAI: Fine-tuning best practices
- OpenAI: Evals guide
- Retrieval-Augmented Generation (Lewis et al.)
- LoRA: Low-Rank Adaptation of Large Language Models
- QLoRA: Efficient Finetuning of Quantized LLMs
- Direct Preference Optimization
- Training language models to follow instructions (InstructGPT)
- Self-Instruct
- Hugging Face PEFT documentation
The TRAIN Decision Framework
One framework recurs through the book. Before any fine-tuning run, answer five questions. If any one fails, you are probably not ready to train, and often you have discovered that you should not train at all.
- T: Task stability. Is the behavior stable enough to encode into weights? A target that changes weekly does not belong in a model that takes days to retrain and validate.
- R: Required knowledge. Does the task need current facts (externalize them) or durable behavior (a candidate for training)? Confusing these is the most common false diagnosis in the book.
- A: Available examples. Do you have enough clean, consistent demonstrations or preference pairs, not just enough rows, but enough trustworthy ones?
- I: Impact of mistakes. What breaks if the model internalizes the wrong behavior? A bad prompt is fixed in a deploy; a bad fine-tune ships a confident error into every relevant request until you catch it.
- N: Necessary evaluation. Can you prove the fine-tune beats the cheaper alternatives on the metrics that matter, before and after, including regressions?
TRAIN is used as a lens, not a template. It will not appear as a forced subsection in every chapter. It is the question set a mature team can answer before spending money and risk on a training run.
Table of contents
Movement I: The Diagnosis Before the Training Run
- The Support Bot That Knew the Old Product
- What Fine-Tuning Actually Changes
- Five False Diagnoses
- The Customization Menu and a Decision Tree
Movement II: The Problems Fine-Tuning Is Actually Good At
- Format, Behavior, and the Shape of a Repeated Task
- House Style, Domain Phrasing, and Tool Discipline
- Specializing Small Models and Distilling Down
Movement III: The Data Is the Model Update
- Demonstrations, Corrections, and Preferences
- Labels, Disagreement, and Coverage
- Contamination, Leakage, and the Splits That Save You
- Synthetic Data: When It Helps, When It Poisons
Movement IV: Methods Without Mystery
- SFT, LoRA, and QLoRA in Practical Terms
- Preference Tuning, DPO, and Distillation
- What to Fine-Tune: Generators, Retrievers, and Routers
Movement V: Evaluation Before, During, and After
- Baselines, Regression Walls, and the Release Gate
Movement VI: Operating Fine-Tuned Models
- Versioning, Lineage, Drift, and Retirement
Movement VII: Use Case Playbooks
- Ten Playbooks for the Decision Meeting
Back matter
- Glossary
- Implementation Checklist
- Research and Source Register
