Name: Agents That Actually Work
Availability: InStock

Most systems people call agents are scripts, workflows, or copilots in costume, and the mislabel is where the cost comes from.

What an Agent Is and Is Not comes down to one control question: does the model choose the next step at runtime, or did a human author the path in advance?

Key Takeaways

The axis that separates the five system types (script, workflow, copilot, RPA, agent) is who decides the next step: a human at authoring time, or the model at runtime.

An agent is the only system whose control flow is generated at runtime, which is the single source of both its flexibility and its danger.

Use an agent only when the path is non-enumerable, the input is genuinely ambiguous, or the work is multi-step over changing external state. Otherwise prefer a workflow, copilot, or script.

Avoid agents for deterministic processes, high-cost irreversible actions without gates, and fragile or unobservable tool stacks.

The "mislabel tax" is real: calling a workflow an agent grants unsafe autonomy; calling an agent a workflow strips required controls. Decide which you have before you write code.

Read this beside the BOUND Test, the autonomy ladder, and agentic workflows when you turn the chapter into a production design.

A finance team I worked with had what they proudly called an "AP agent." It read invoices, matched them to purchase orders, and queued payments. When I asked to see the agent, they showed me a flowchart. Every box was a fixed step. Every branch was a hardcoded rule. The only thing the language model did was extract fields from a PDF. There was no goal the system pursued, no plan it formed, no tool it chose. It was a data-extraction step bolted onto a workflow that had existed since 2014.

That is not a criticism of the system. The system was excellent: deterministic, auditable, cheap, and boring in the way good infrastructure is boring. It is a criticism of the word. They called it an agent because "agent" was the word that got the project funded, and the word then started writing checks the architecture could not cash. When leadership heard "agent," they imagined it could handle exceptions autonomously, so they cut the human review queue. The system, which had no autonomy at all, then routed every exception straight to "approved" because that was the default branch. The mislabel created the failure.

So before we design anything, we need vocabulary precise enough to argue with. The cost of confusing these categories is not pedantic. It is operational, and it shows up on the ledger.

Five things people call agents

There are at least five distinct system types hiding under "AI agent." They differ on one axis that matters more than any other: who decides the next step.

A script decides nothing. You wrote the steps; it runs them in order. A cron job that pulls a report, transforms it, and emails it is a script. The control flow is yours, fixed at authoring time. Scripts are wonderful. They fail loudly, they are trivial to test, and a junior engineer can read one and know exactly what it does. The LLM era did not repeal scripts. Most automation should still be scripts.

A workflow decides between predefined paths. It is a script with branches and state, usually drawn as a state machine or a directed graph. Business process management and the BPMN standard formalized this decades ago, and the workflow orchestration world (think durable execution systems like Temporal) made it reliable at scale. The branches are known in advance. A model may inform a branch ("is this invoice fraudulent, yes or no?"), but the model does not invent new branches. The set of reachable states is finite and was designed by a human.

A copilot decides nothing irreversible, because a human is in the loop on every consequential action. It suggests, drafts, autocompletes, and proposes. GitHub Copilot suggesting a function is the canonical example: the model produces output, the human accepts, edits, or rejects, and only the human's keystroke commits. The autonomy is bounded to proposal. The human is the actuator.

RPA, robotic process automation, decides nothing either, but it is worth its own category because it fails differently. RPA scripts the user interface: click here, type there, read this cell. It is a script that pretends to be a person operating an app. It is famously brittle, because a UI change that a human would shrug off (a button moved twelve pixels) breaks the whole thing. RPA is what many "agents" are quietly replacing, and understanding why RPA is brittle tells you exactly which failure modes an agent must avoid inheriting.

An agent, finally, decides its own next step toward a goal, using tools, in a loop, where the path is not fully known in advance. This is the only one of the five where the control flow is generated at runtime by the model rather than authored ahead of time by a human. That single property is the source of every benefit and every danger in this book. An agent can handle a path you did not anticipate. An agent can also take a path you did not anticipate.

Here is the distinction in a table you can hand to a product manager.

System	Who decides next step	Control flow	Reversibility default	When it shines
Script	Author, fully	Fixed, authored	N/A (deterministic)	Stable, repeating tasks
Workflow	Author, among known branches	Finite state machine	Designed per transition	Variable but enumerable paths
Copilot	Human, per action	Suggestion only	Human commits	Skilled human, high judgment
RPA	Author, via UI	Fixed, UI-coupled	Whatever the UI allows	Legacy systems without APIs
Agent	Model, at runtime	Generated per run	Must be designed in	Ambiguous, multi-step, open paths

The column that ends careers is "reversibility default." Scripts and workflows have reversibility designed at authoring time, when a human is thinking carefully. Agents generate their actions at runtime, when no human is thinking at all. If you do not design reversibility in, the agent's default reversibility is whatever the underlying tool happens to allow, which in the case of applyCredit was "none."

Circular agent control loop of observe, decide, act, evaluate with a branch to real-world tools — The honest minimal agent loop, where the model chooses the next action at runtime and the act step touches real tools with side effects.

What an agent actually is, structurally

Strip away the marketing and an agent is a control loop with four parts. The academic framing that most production systems descend from is ReAct, Yao et al., which showed that interleaving reasoning traces with actions outperforms either alone, because the reasoning helps the model track and revise a plan while the actions let it gather information from the world. The loop, in its honest minimal form, is:

Observe. Read the current state: the goal, the history so far, the latest tool result.
Decide. Reason about what to do next and choose an action, usually a tool call or a final answer.
Act. Execute the chosen tool against the real world.
Evaluate. Incorporate the result, check whether the goal is met, and decide whether to loop again or stop.

Everything else, planning subroutines, memory, multi-agent choreography, reflection, is an elaboration on this loop. The elaborations are real and sometimes valuable, but they do not change the fundamental fact: somewhere in the loop, the model chooses the next action, and that choice is not enumerated in advance.

This is why an agent is categorically different from a workflow even when they look similar in a diagram. A workflow's graph is the territory; you can print it, review it, and prove properties about it. An agent's graph is generated fresh on every run and you do not get to see it until afterward, if you instrumented it, which brings us back to the black hole from the introduction.

The three honest reasons to use an agent

Autonomy must earn its place, so the burden of proof is on the agent. There are exactly three conditions where it carries that burden, and they tend to appear together.

The path is variable and not enumerable. If you can draw the full set of branches a task can take, draw it, and build a workflow. You only need an agent when the branches are genuinely open: a research task where the next query depends on what the last one returned, a debugging task where the next file to read depends on the last stack trace. SWE-agent and the broader coding-agent line exist because fixing a real bug is not enumerable; you cannot pre-draw the graph of "which file to open next."

The input is ambiguous and requires interpretation, not just extraction. Extracting a total from an invoice is not ambiguity; it is field extraction, and a workflow with a model step handles it. Deciding what a customer actually wants from a rambling, contradictory three-paragraph complaint, then choosing among genuinely different resolution paths, is ambiguity. Note that ambiguity raises the value of autonomy and the cost of error simultaneously, which is exactly why the BOUND Test exists.

The work is multi-step with external state that changes between steps. If step three depends on what a real system returned at step two, and that system can change underneath you, a static plan is wrong before it finishes printing. Agents earn their keep when the plan must adapt to a world that does not hold still.

If a task has none of these, you are looking at a workflow, a copilot, or a script, and you should be relieved, because those are cheaper, more testable, and far less likely to issue a surprise credit.

The three honest reasons not to

Symmetry matters, so here are the conditions where an agent is the wrong tool even if it could technically do the job.

The process is deterministic. If the same input should always produce the same output via the same steps, a non-deterministic system is a downgrade dressed as an upgrade. You are adding variance, cost, and latency to buy flexibility you do not need. Worse, you are making the system harder to audit. A deterministic process deserves a deterministic implementation.

The error cost is high and the action is irreversible. Money moved, data deleted, emails sent to customers, production configs changed. The more irreversible the action, the more the burden of proof on autonomy rises, because the agent's runtime-generated plan is the one thing you did not review. This does not mean agents can never touch irreversible actions. It means those actions must sit behind approval gates and rollback paths, which is a later chapter, not a default.

The tools are fragile or the system is unobservable. An agent is only as reliable as the tools it calls and only as debuggable as the traces it leaves. If your tools are flaky, undocumented, or have unclassified side effects (recall applyCredit), the agent will amplify their fragility, because it calls them more often, in combinations you did not test, in response to inputs you did not anticipate. And if you cannot observe what it did, you cannot fix what went wrong. Observability is not a nice-to-have for agents. It is a precondition, which is why "Observable state" is the O in the BOUND Test.

The mislabel tax

Return to the AP "agent" that approved every exception. The technical system was fine. The damage came from the gap between what the word promised and what the architecture delivered. Leadership heard "agent," assumed autonomy that did not exist, removed a human control that the real system depended on, and the deterministic default did the rest.

This is the mislabel tax, and it runs in both directions. Call a workflow an agent and people grant it autonomy it cannot safely use. Call an agent a workflow and people skip the trace, the kill switch, and the approval gate that an agent actually needs. Either way, the failure is wired in before the first line of code, because the wrong mental model selected the wrong controls.

So the most useful thing you can do at the start of any project is refuse the word "agent" until you have answered one question: at runtime, does the model choose the next step, or does it merely inform a step a human already enumerated? If the latter, you have a workflow with a model in it, and you should build it like one and sleep well. If the former, you have an agent, autonomy is now a liability on your balance sheet, and the rest of this book is about making it earn its place.

We make it earn its place by climbing one rung at a time. The next chapter builds the ladder.