The blog
Long-form thinking on the engineering and economics of AI-Native systems, published when I have something worth saying, not on a schedule.
Principles of Building AI Agents That Hold in Production
The principles of building AI agents do not live in any framework: bound the autonomy, name what you never delegate, evaluate continuously, and design honest memory.
How to Build an AI Agent (the Loop That Holds)
How to build AI agents that hold: spec the task, give it bounded tools, add guardrails in code, wire evals, and ship behind a human gate.
Agentic AI Frameworks Compared (From Production)
There is no single best agentic AI framework. Compare LangGraph, CrewAI, and the OpenAI Agents SDK by what each costs you in control, observability, and lock-in - not by the feature list.
Agentic AI Examples: What's Genuinely Shipping
The agentic AI shipping in 2026 clusters in four categories: coding, research, customer-ops, and data/ops automation. Concrete, dated examples here.
Offline vs Online LLM Evaluation: Why You Need Both
Offline evaluation gates a deploy against a frozen set; online evaluation measures real behavior after release. You need both.
Memory Systems for AI Agents: Remember Without Inventing
AI agent memory is what an agent retains across steps and sessions. The hard part is honesty: a system that misremembers beats nothing and harms plenty.
LLM Evaluation: Measuring What Will Break
LLM evaluation is the harness that gates a real deploy. Learn what to measure, which metrics lie, when to trust an LLM judge, and who should own it.
Human-in-the-Loop Evaluation That Scales
Human-in-the-loop evaluation scales only when people review the flagged tail - the low-confidence, high-stakes, adversarial slice - not every output.
The Best AI Agents in 2026 (An Honest Roundup)
The best AI agents in 2026 are coding agents, deep-research agents, customer-ops agents, and orchestration frameworks - each strong in a narrow band.
Agentic RAG: When Your Agent Needs to Retrieve
Agentic RAG lets the agent decide when and what to retrieve, iterate, and verify. It wins on multi-hop and ambiguous queries, and it costs you.
Agentic Coding: What Changes When the Machine Writes Code
Agentic coding is the AI-Native SDLC in practice: the machine writes the implementation, the engineer specifies intent and evaluates the diff.
Agentic AI Use Cases and the Constraint That Picks One
The best agentic AI use cases are repetitive, tool-bounded, and high-volume with a checkable outcome. Match the use case to the constraint, not the hype.
How to Build an LLM Evaluation Framework
A good LLM evaluation framework tests what will break in production: a golden set from real traffic, task metrics, blinded rubrics, and a drift cadence.
AI-Native means the machine does the job
Not assisted. Not augmented. The model does the whole job, and our role narrows to a single thing: judgment.
AI Agents and Agentic Workflows: An Honest Field Guide
Agentic workflows let AI agents take actions toward a goal in a loop. They earn their keep in a narrow band - here is exactly where, and where they fail.
Agentic Design Patterns That Actually Work
The agentic design patterns that survive production are the bounded ones: tool-use with guardrails, plan-then-execute, reflection, and HITL at named decisions.
Agentic AI vs Generative AI: What's Actually Different
Generative AI produces content from a prompt. Agentic AI plans and acts toward a goal - and actions carry consequences generation never does.
LLM Evaluation Metrics That Matter (and the Ones That Lie)
The LLM evaluation metrics that matter measure what breaks in production. The ones that lie measure what looks good in a deck. Here is how to tell them apart.
Evals that predict production, not vanity
Most eval suites measure the wrong thing and pass right up until launch. Here is the harness I actually trust before I ship.
The CRO's case for shipping smaller models
Revenue rarely rewards the biggest model. It rewards the one you can afford to run, ship, and explain to a customer.
LLM-as-a-Judge: When to Trust It
LLM-as-a-judge is reliable for cheap, scaled, relative grading on tight rubrics. It breaks wherever its own biases contaminate the call. When to trust it.
RAG Evaluation: Measuring Retrieval Before It Collapses
RAG evaluation works only when you score retrieval and generation separately on a frozen golden set. Here is how to catch recall decay before it ships.
When doing is cheap, deciding is everything
If generation costs approach zero, value migrates to whoever can tell good output from bad. What that does to a company.
LLM Evaluation Tools Compared (From Production)
The right LLM evaluation tool depends on whether you need offline suites, online monitoring, or human labeling. Most teams need a thin layer they control.
'A human reviews it' is not a plan
Putting a person in the loop feels safe and scales terribly. The reviewer becomes a bottleneck, then a rubber stamp, then a liability.
Why most RAG pipelines fail in month three
The demo retrieves perfectly. Then the corpus grows, the queries drift, and recall quietly collapses. Here is the gap, and how I close it.
How to Evaluate an AI Agent (Evals for Agents)
AI agent evals score the whole trajectory: tool calls, step efficiency, recovery, and goal state, not just the final answer. The harness that gates a deploy.
How to Measure (and Reduce) Hallucination
Measure hallucination as faithfulness against a source on a frozen set, then reduce it with grounding, constrained decoding, and calibrated abstention.
An honest accounting of what agents can do today
Between the demos and the disappointment lies a narrow band of tasks where agents genuinely earn their keep.
The spec is the program now
When the model writes the implementation, the specification becomes the artifact you actually version and defend.
Eval-Driven Development: The Test Suite Leads
Eval-driven development is TDD for probabilistic systems: write the eval first, gate every deploy on a frozen eval set, and treat the suite as the spec.
Selling AI to people who have been burned by AI
Three years of inflated claims left buyers skeptical. That skepticism is an asset if you sell to it honestly.
How to Build a Golden Eval Set From Production
A golden dataset for LLM evaluation is a frozen, versioned slice of real traffic with trusted reference answers, over-weighted toward the adversarial tail.
What a team is for after the machine does the work
When generation is cheap, the org chart built for production is the wrong shape. Re-drawing it around judgment.
How to Reduce LLM Inference Cost Without Wrecking Quality
Reduce LLM inference cost by right-sizing the model, caching what repeats, quantizing, trimming tokens, and batching. Here is the order to pull those levers, and what each one actually saves.
RAG vs Fine-Tuning: When Each Wins in 2026
RAG vs fine-tuning is the wrong fight. RAG handles knowledge that changes; fine-tuning shapes behavior that persists. Here is when each wins, and why most teams end up shipping both.
Prompt Caching: What It Is and When It Saves Money
Prompt caching reuses the already-computed prefix of a prompt so repeated tokens get billed at a deep discount. Here is when it saves money, and when it does not.
LLM Model Routing: Cheapest Model That Can Do the Job
LLM model routing sends each request to the cheapest model that can handle it, escalating only when needed. Here is how it cuts cost without cutting quality.
LLM Quantization: When 4-Bit Pays (and When It Bites)
LLM quantization stores a model at fewer bits per weight, cutting memory and cost. The trade-off: quality holds on most tasks and quietly breaks on a few.
Semantic Caching for LLMs: When It Saves Money
Semantic caching reuses a past LLM answer for a question that means the same thing, even when the words differ. Here is when it saves money, and how it differs from exact prompt caching.
LLM Token Optimization: Cut Token Cost, Keep Quality
LLM token optimization means cutting the tokens you send and generate, in that order of payoff. Start with output, because output is priced 5x to 6x higher than input.
Hiring AI Engineers: The Definitive 2026 Guide
AI engineers are the hardest role on the market to fill. Here is what good actually looks like, what it costs, and how the bad hires fail.
AI Engineer Skills: What Actually Separates the Good Ones
The AI engineer skills that matter in 2026 are LLM and RAG work, eval design, prompt and context engineering, and solid software fundamentals. The one that separates the good hires is judgment.
AI Engineer Interview Questions That Reveal the Real Ones
The AI engineer interview questions that work test judgment, not trivia: RAG failure modes, eval design, and how a candidate handles being wrong.
AI Engineer Cost: What It Really Takes to Hire One
AI engineer cost is far more than salary. Here are the real 2026 ranges, the loaded number nobody quotes you, and how to choose between in-house, staff aug, and an agency.
AI Engineer Job Description: What to Put In It
A good AI engineer job description names the production problem, separates required from nice-to-have, and avoids the keyword pile that repels your best builders.
How to Vet AI Engineers: The Process That Predicts
How to vet AI engineers in a way that predicts on-the-job performance: the work-sample that mirrors real work, the judgment probe, references, and a paid trial.
Senior vs Junior AI Engineer: The Real Difference
Senior vs junior AI engineer is no longer a question of years. It is whether they can evaluate what the model generated, not just generate it. AI widened that gap.
In-House vs Outsourced AI Development: The Decision
I have built in-house AI teams and delivered as the outsourced partner. Here is the framework, not the sales pitch, for choosing between them.
Staff Augmentation vs Consulting: Who Owns the Outcome
Staff augmentation vs consulting comes down to one question: who owns the outcome. Here is when each fits, what it really costs, and how to choose for AI work.
AI Team Structure: The Roles You Need in 2026
The roles an AI team needs have not changed much. What changed is the shape: fewer people, more senior, and a real evaluation function at the center.
When to Hire an AI Engineer (and When to Wait)
When to hire an AI engineer: the signals that mean it is time for your first AI hire, the signals that mean wait, and what hiring too early actually costs.
AI Engineer Red Flags: How to Spot a Bad Hire
The AI engineer red flags that predict a bad hire: no evals, a demo that never shipped, a resume of buzzwords. Here is how to surface each one before you sign.
AI Hiring Mistakes That Cost the Most (and the Fixes)
The most expensive AI hiring mistakes are not bad luck. They are predictable: hiring for hype, never testing evaluation skill, and the wrong role for your stage.
Building an AI Team: The Order You Actually Build It In
Building an AI team is a sequencing problem, not a headcount problem. Here is the order I build them in, first hire to scaling, without the bloat.
What Is an AI Engineer? The Role, Explained by a Hirer
What is an AI engineer? Someone who builds production AI features on foundation models. Here is the role, what they do, and when you need one.
AI Engineer vs ML Engineer: What Actually Differs
An AI engineer wires existing models into a product; an ML engineer builds and trains the model. Here is the real difference, and who to hire when.
AI Engineer vs Data Scientist: Who to Hire When
An AI engineer ships AI features into your product; a data scientist extracts insight and builds the models behind decisions. Here is which one to hire when.
AI Engineer vs Software Engineer: The Real Difference
AI engineer vs software engineer: one builds deterministic systems you can test, the other builds probabilistic systems you have to evaluate. Who to hire when.
What Is an LLM Engineer? The Role, Explained for Hirers
What is an LLM engineer? The specialist who turns foundation models into reliable production features. Here is the role, what they do, and when to hire.
How to Hire an LLM Engineer (and What to Look For)
How and where to hire an LLM engineer, the signals to screen for, what it costs, and when to hire through a partner instead of building the loop yourself.
How to Hire an ML Engineer (and What to Look For)
How and where to hire an ML engineer, the skills and signals to screen for, what it costs, and when to hire through a partner instead of building in-house.
How to Hire an MLOps Engineer (Without Getting Burned)
Hiring an MLOps engineer is a reliability bet, not a tooling checklist. Here is what the role owns, how to vet for it, what it costs, and when you actually need one.
How to Hire a RAG Engineer Who Survives Production
Most RAG engineers can demo retrieval. Few can keep recall from collapsing in production. Here is how to hire the second kind, what they own, and what it costs.
How to Hire an AI Agent Developer (and Vet One)
Hire an AI agent developer who owns planning, tools, memory, evals, and guardrails, not someone who demos a flashy agent that dies in production.
How to Hire a Generative AI Engineer (What to Screen For)
How and where to hire a generative AI engineer, the production signals to screen for, what it costs, and when to hire through a partner instead.
How to Hire a Computer Vision Engineer: What to Look For
How to hire a computer vision engineer who survives your real-world images: the skills and signals to screen for, where to find them, what it costs, and when you actually need one.
How to Hire an NLP Engineer (and What to Look For)
How and where to hire an NLP engineer, the signals to screen for, what it costs, and why the role still matters in the LLM era, from an operator who hires them.
Hire a Prompt Engineer? When You Actually Need One
Hire a prompt engineer only when the skill cannot live inside an AI engineer. Here is what the role really is in 2026, how to screen for it, and what it costs.
How to Hire an AI Solutions Architect (Without Regret)
Hire an AI solutions architect to own system design, integration, build-vs-buy, governance, and cost. Here is what the role really owns, how to screen for it, and when you actually need one.
How to Hire an AI Product Manager (What to Look For)
How and where to hire an AI product manager, the signals to screen for, what an AI PM actually owns, and what it costs in 2026.
How to Hire a Python Developer for AI (What to Look For)
How to hire a Python developer for AI: the skills and signals to screen for, the generalist-versus-specialist trap, what it costs, and when to hire through a partner.
How to Hire a React Developer for AI Products
Hire a React developer who can build AI-product frontends: streaming chat, agent interfaces, and state that survives token-by-token output, not just generic React.
How to Hire a Node Developer for AI Products
Hire a Node developer who can build AI-product backends: streaming APIs, agent orchestration, and tool servers under real load, not just a generic CRUD API.
How to Hire a Full-Stack AI Developer (Without Guessing)
Hire a full-stack AI developer who owns the AI feature end to end: frontend AI UX, model integration, and the eval loop, not a generic full-stack dev who has never shipped against a model.
How to Hire a DevOps Engineer for AI Workloads
Hiring a DevOps engineer for AI is a GPU-cost and reliability bet, not a generic ops hire. Here is what the role owns, how to vet it, and what it costs.
How to Hire a Data Engineer (the AI Foundation)
How and where to hire a data engineer for AI, the skills and signals to screen for, what it costs, and when to hire through a partner instead of building in-house.
How to Hire a Forward Deployed Engineer
A forward deployed engineer embeds with your customer and turns an unclear AI business case into a shipped solution. Here is when you need one, how to vet, and what it costs.
How to Choose an AI Development Company
I run an AI development company, so read me with that bias. Here is what good actually looks like, the questions that expose a slideware shop, and when to skip a vendor entirely.
AI Consulting Services: What You Get and How to Choose
Real AI consulting delivers a shipped, evaluated system, not a deck. Here is what it includes, what it costs, and how to pick a consultant without getting burned.
Staff Augmentation: When It Beats Hiring (and When Not)
Staff augmentation embeds outside engineers in your team while you keep the roadmap and own the outcome. Here is what it is, the models, the real cost, and when it fits.
What Is a Fractional CTO? A 2026 Operator's Guide
A fractional CTO is senior technical leadership on a part-time retainer. Here is what they do, when a startup or SME needs one, and what it costs.
Dedicated Developers vs Freelancers: How to Choose
Dedicated developers vs freelancers comes down to continuity versus flexibility. Here is the honest tradeoff, the hidden costs of each, and how to choose.
The Toptal Alternative That Fits AI Work
Toptal is a strong freelance network. For AI product work that needs an engineer who owns the outcome, a senior, AI-native team is the better Toptal alternative.
Turing Alternative: An Honest 2026 Comparison
Turing is a fast, large-pool talent cloud. If you are shipping AI features, the fit problem is depth, not quality. Here are the real alternatives, compared fairly.
Offshore AI Development: When It Works, When It Burns
I run an offshore AI development shop and I have been the buyer too. Here is the honest version of when it works, what it costs, and where it burns you.
Nearshore vs Offshore: Which Fits AI Development
Nearshore vs offshore comes down to timezone and total cost, not the hourly rate. For AI work, the bigger question is who owns the outcome.
Do You Need an AI Engineer? An Honest Decision Rule
Do you need an AI engineer? Only when AI work is recurring, core, and failing in ways your team cannot diagnose. Here is the honest rule and the alternatives.
The AI Skills Gap: What It Is and How to Fix It
The AI skills gap is real, but the fix is not more training. Here is what the gap actually is, why it persists, and what leaders should do this quarter.
The Cost of a Bad AI Hire (It Is Not the Salary)
The cost of a bad AI hire is not the salary you wasted. It is the un-evaluated system they shipped, the roadmap that stalled, and the trust your team lost.
How AI Changed Software Hiring
How AI changed software hiring comes down to one move: it changed what you screen for. Generation got cheap, so the job is judgment now, not throughput.