Name: A Field Guide to Evals
Availability: InStock

An eval that only passes is not telling you anything. This manual builds an evaluation harness from production traffic up: what to sample, how to label without lying to yourself, and how to read a number you can defend in a review. Code, not philosophy.

Most eval suites measure the wrong thing and pass right up until launch. The harness I trust before I ship.

This edition is free to read onsite. Each chapter has its own URL, so readers can bookmark, share, and return to the exact section they need.