What a team is for after the machine does the work

When generation is cheap, the org chart built for production is the wrong shape. Re-drawing it around judgment.

For most of my career, the org chart made intuitive sense. You needed people to produce things. You hired producers. You structured them into layers that could coordinate production at scale. Engineers wrote code. Designers made screens. Analysts pulled reports. Managers made sure the producers were producing. The shape of the organization followed the shape of the work, and the work was, fundamentally, about generation. Creating the artifact. Shipping the thing.

That assumption is cracking. Not dramatically, not all at once, but quietly, in the numbers that matter. When I look at what Devlyn's teams can actually output now versus two years ago, the ratio has shifted in ways that should force a serious conversation about what we are actually hiring people to do, and what the right shape of an organization looks like when generation is no longer the hard part.

The hard part, increasingly, is judgment.

The constraint moved. Org charts were built for a production bottleneck; when generation is cheap, the bottleneck shifts from "can we produce this?" to "is this the right thing?"
Roles split. Artifact-generation roles contract; judgment, specification, and decision-ownership roles expand. The senior-to-junior ratio tilts sharply senior.
The org gets flatter. The coordination middle thins, spans of control change, and the scarce skill becomes confident evaluation, not throughput you can count.

The org chart was built for a production constraint that no longer exists

Here is the old logic: you had more ideas than you had capacity to execute. The bottleneck was production. So you hired to the bottleneck. You hired engineers to write software because software had to be written by humans, one line at a time. You hired writers because copy had to be drafted by humans, one sentence at a time. You hired analysts because reports had to be assembled by humans, one query at a time. Your span of control, your team ratios, your hiring velocity, all of it was calibrated to the assumption that humans were the primary production unit.

The org chart you drew in 2019 was not wrong. It was correct for the constraint you were optimizing against. The problem is that the constraint has changed and most org charts have not.

When generation becomes cheap, when a capable model can produce a first draft, a working prototype, a data summary, a test suite, in seconds, the bottleneck moves. It moves from "can we produce this?" to "is this the right thing?" It moves from generation to evaluation. From throughput to direction. The machine can fill the canvas. The question is whether anyone in your organization actually knows what a good painting looks like, and whether they can specify it clearly enough that the machine makes the right one.

When generation becomes cheap, the bottleneck moves. It moves from "can we produce this?" to "is this the right thing?" From throughput to direction.

What contracts and what expands

Let me be concrete, because this conversation tends to get abstract in ways that obscure what is actually happening inside organizations.

What contracts: roles whose primary output is artifact generation. Junior engineers whose job was to implement clearly-specified tickets. Content producers whose job was volume. Analysts whose job was to run the same query in a new configuration. QA testers whose job was to manually execute test scripts. Not because these people are not valuable, but because the leverage available to a skilled senior person with good tooling now covers what previously required several people beneath them. The work gets done; fewer bodies touch it.

What expands: roles whose primary output is judgment, specification, and decision ownership. Senior engineers who can architect a system and then evaluate whether the model-generated implementation is actually sound. Product thinkers who can write a precise spec that constrains the output space. Editors who can tell the difference between a generated paragraph that passes and one that damages brand. Domain experts who can catch the confident wrong answer. People who own an outcome end-to-end, not a slice of the process.

At Devlyn, we have operationalized this as a hiring posture: Senior engineers only. No juniors hidden behind AI. That is not a statement about junior engineers being bad. It is a statement about what we actually need right now. We need people who can read model output and know immediately whether it is correct, not people who are still developing that calibration. The gap between a plausible-looking wrong answer and a correct one is invisible without deep expertise. Hiring people who cannot see that gap does not reduce the risk; it just buries it.

This is explored at length in Building an AI-Native Team: Hiring for judgment, not throughput, if you want the full framework for evaluating candidates in this environment. The short version is that the interview process needs to change entirely. You are not testing for production speed. You are testing for the ability to specify, evaluate, and own.

Spans of control have to change, and most managers are not ready for why

The traditional argument for span of control limits was coordination cost. A manager with twelve direct reports cannot give each of them the attention needed to keep work aligned and quality high. So you kept spans at six or seven, added layers, and scaled that way.

Two things happen when the production ratio shifts. First, a smaller team can produce more output, which means a manager coordinating a smaller headcount is now accountable for the output volume that previously required a much larger group. That changes the nature of the management job dramatically. You cannot manage an AI-augmented team the way you managed a headcount-equivalent team. The leverage is different. The failure modes are different. The thing you need to watch is different.

Second, the manager's job itself changes in character. The traditional manager spent a significant portion of time coordinating production, who is working on what, is it moving, what is the blocker. When generation is cheap and fast, that coordination function shrinks. The manager's actual job becomes: setting clear specifications, establishing quality evals, reviewing output against intent, and making the judgment calls that cannot be delegated. That is a different skill set than classic people management, and most managers promoted in the last decade were promoted for the old one.

The manager who thrives in this environment is not the one who is best at running standups and unblocking tickets. It is the one who can write a crisp spec, recognize when the output does not match it, and make a decision about what to do next without waiting for consensus. It is the one who can tell you, precisely, what good looks like, because if they cannot tell you, the model cannot be correctly steered, and the team will generate plausible work that is not quite the right work.

Judgment you can observe, not throughput you can count

Here is the hiring problem this creates: throughput is easy to measure. Lines of code, tickets closed, articles published, reports shipped. You can see throughput. You can count it. Judgment is much harder. You cannot run a job description that says "must demonstrate excellent judgment" and then test for it in a standard loop.

What I have learned, both at Devlyn and in conversations across the companies I advise, is that you have to engineer the interview process specifically to surface judgment, and you have to be willing to slow down and pay for it.

Some things that actually surface judgment: give candidates real work from your domain, with real ambiguity, and watch how they make sense of the constraint space before they produce anything. Ask them to evaluate output, not produce it. Show them a generated artifact and ask: what is wrong with this? What decision would you change? What would you need to know before you shipped this? People who have judgment can answer those questions. People who have been trained for throughput often cannot; they will pivot immediately to how they would produce something better rather than analyzing what is actually wrong with what is in front of them.

We have also learned to be explicit about ownership expectations. Our internal shorthand is: Ownership over hours. Outcomes over velocity. We are not measuring presence or pace. We are measuring whether the outcome was good and whether this person drove it. That shifts accountability in a way that selects for people who actually want to own things, which is a different population than people who are good at looking busy.

The broader framework here, how economies restructure when judgment becomes the scarce input, is something I think about through the lens of The Judgment Economy, which lays out where value concentrates when execution commoditizes. The shift happening inside companies is a micro version of the macro pattern: the humans who remain valuable are the ones who are doing the thing that is hardest to automate, which is not production. It is taste, intent, evaluation, and decision.

What the new org chart actually looks like

I want to resist drawing the definitive org chart because it varies by industry and company stage. But I can describe the shape. It is flatter. Fewer layers between the person setting intent and the output. Each person in the chain owns more surface area but has more leverage per hour of work. The "layer of people who receive clear specs from above and write clean code below" is thinner. The "people who write the specs" layer is thicker.

The ratio of senior to junior tilts sharply senior. Not because junior roles disappear entirely, there are still places where someone needs to develop craft, but because the leverage math changes. One senior engineer who can architect and evaluate is now worth three or four production-oriented juniors in terms of output quality you can trust. If you are trying to move fast and cannot afford to have a senior reviewing every line of junior output, you are better off with fewer people and higher floor-level judgment.

The evaluator role, someone whose job is specifically to review model output against quality and brand standards, becomes a real function rather than something bolted on informally. At Devlyn, we have found that the bottleneck on speed is rarely generation; it is confident evaluation. When you know the output is good enough to ship, you can move. When you are not sure, you loop. Building teams with strong evaluative capacity directly reduces that loop friction.

Cross-functional fluency matters more than it did. When a single person with AI tooling can produce what used to require a team, the question of whether that person understands the adjacent domain becomes critical. An engineer who cannot evaluate UX will generate technically correct implementations that miss the user. A designer who cannot evaluate technical feasibility will specify things that look right but cost three times as much to build. The traditional handoff model, produce here, hand off there, evaluate in another department, is too slow and too lossy when the pace of generation increases. You need people who can hold more of the stack in their heads.

The bottleneck on speed is rarely generation anymore. It is confident evaluation. When you know the output is good enough to ship, you can move.

The leadership implication nobody is saying out loud

I will say it: the people most at risk in this transition are mid-level managers who built their careers on coordination and throughput management, and senior leaders who mistake activity for output.

The coordination layer, the manager whose primary value was making sure the team was moving and blocking was removed, is thinner in an AI-augmented team because the production pace is faster and the work is more self-directing. You do not need as many people managing the pipeline when the pipeline runs faster. What you need is people who can set the intent clearly at the top and evaluate correctly at the bottom. The middle thins.

For leaders, the risk is a different kind. Leaders who have been rewarded for building headcount, for scaling teams, for managing complexity through organizational structure, they may resist the logic here because it runs counter to their intuitions about what scaling looks like. Scaling used to mean hiring. In an AI-augmented organization, scaling may mean keeping headcount flat while dramatically increasing the judgment density of the team you have. That is a different mental model of growth, and it requires leaders to stop treating headcount as the primary proxy for organizational capability.

The questions I bring to every leadership conversation now: What is the evaluation loop for AI-generated output in your organization? Who owns quality, and do they have enough seniority and domain knowledge to actually see problems? When you imagine the org chart two years from now, what assumptions about production cost are you baking in, and are those assumptions still correct?

For the detailed thinking I have been building out on this, including specific frameworks for evaluating team shape and hiring posture, I have covered this more fully in Org Charts After Automation: Points of View, Volume III. The patterns I am seeing across the companies I work with suggest we are still early, most teams have added AI tooling without rethinking the structure, which means they are getting productivity gains today but building technical debt into their org design that will be painful to unwind later.

The machine doing the work is not the disruption. The disruption is the implication for what you need the humans to do. Production is no longer the constraint. Judgment is. The org chart should reflect that, and most of them do not yet.

Frequently asked questions

How does org structure change after AI automation? The org chart flattens. Layers built to coordinate human production thin out, spans of control change, and the senior-to-junior ratio tilts sharply senior. Fewer people own more surface area, with more leverage per hour, and the work shifts from generating artifacts to specifying intent and evaluating output.

Should you stop hiring junior engineers when AI handles generation? Not universally, but the leverage math changes. The posture I run at Devlyn is senior engineers only: people who can read model output and know immediately whether it is correct. The gap between a plausible wrong answer and a right one is invisible without deep expertise, so hiring people who cannot see that gap buries risk rather than reducing it. If you are building a team along these lines, this is the work we do at Devlyn.

Who is most at risk in this transition? Mid-level managers whose value was coordination and throughput management, and senior leaders who mistake activity for output. When the pipeline runs faster and the work is more self-directing, the coordination middle thins. What survives is the ability to set intent clearly at the top and evaluate correctly at the bottom.

What a team is for after the machine does the work

The org chart was built for a production constraint that no longer exists

What contracts and what expands

Spans of control have to change, and most managers are not ready for why

Judgment you can observe, not throughput you can count

What the new org chart actually looks like

The leadership implication nobody is saying out loud

Frequently asked questions

Keep reading

Principles of Building AI Agents That Hold in Production

How to Build an AI Agent (the Loop That Holds)

Agentic AI Frameworks Compared (From Production)