AN Alpesh Nakrani
BlogBooksPraiseAbout Work with me →
Book overview
Chapter 4 / The AI-Native Canon

Hiring for Judgment You Can Observe

The candidate's portfolio looked spectacular. Every example used the latest tools.

Key Takeaways

  • AI-native interviews should test supervision of generated work, not blank-page speed alone.
  • The strongest signal is calibrated doubt: evidence use, assumptions, risk classification, and verification plan.
  • Junior candidates can show coachable judgment even when they do not yet have senior judgment.
  • Flawed AI artifacts make better interview material than polished portfolio demos.

Hiring for observable judgment means giving candidates plausible AI output and watching how they verify, reject, revise, or escalate it.

The candidate's portfolio looked spectacular. Every example used the latest tools. Every demo had smooth narration. Every screen showed speed. The hiring panel was impressed until the staff engineer asked a simple question: "Show us a time you rejected an AI-generated answer that looked good." The candidate paused. The answer was vague. They knew how to produce; they could not yet demonstrate judgment.

AI-native hiring must make judgment observable.

This chapter gives hiring managers a practical method for evaluating judgment without falling back on intuition. The key is to test candidates on ambiguous artifacts, not blank-page production. Give them a flawed AI output. Give them incomplete context. Give them a risk boundary. Ask them what they would trust, what they would reject, what they would verify, and how they would operationalize the lesson.

Research spine

This chapter uses: Forsgren et al., The SPACE of Developer Productivity; Edmondson, The Fearless Organization / psychological safety research; Brynjolfsson, Li, Raymond, Generative AI at Work, NBER Working Paper 31161; GitHub Research, Quantifying GitHub Copilot's impact on developer productivity and happiness.

The hiring signal changes

In older workflows, a take-home exercise often tested whether a person could produce a complete artifact under time pressure. That signal is weaker now. A candidate can use tools to produce something coherent quickly, and banning tools in the interview gives you less information about how they will actually work. The better signal is how they supervise, constrain, evaluate, and revise tool output.

This requires interview design. Ask for critique, not only creation. Ask for failure diagnosis, not only success demonstration. Ask for assumptions, trade-offs, and risk classification. A strong candidate will name what they do not know and create a verification plan. A weak candidate will polish the artifact and call it done.

The judgment interview

A judgment interview has four parts. First, give the candidate an artifact generated by a model: a product spec, code patch, customer response, data analysis, policy summary, or sales proposal. Second, provide partial context and make the missing context realistic. Third, ask the candidate to review the artifact and produce a decision memo: ship, revise, reject, escalate, or test. Fourth, debrief their reasoning.

The interviewer should score the candidate on evidence use, assumption naming, risk awareness, domain reasoning, evaluation design, and communication. This is not a trick. It is a simulation of actual AI-native work.

Avoiding seniority bias

Judgment is easier to observe in experienced candidates, but AI-native teams cannot become senior-only organizations. The right junior signal is not perfect judgment; it is coachable judgment. Does the candidate notice uncertainty? Can they compare alternatives? Can they explain why an answer might be wrong? Can they accept correction and update the rubric? Can they distinguish personal preference from external standard?

Apprenticeship becomes a hiring criterion. If the organization has no learning loop, it should not pretend juniors will automatically develop judgment by using tools. The interview should reveal both the candidate's current capacity and the team's responsibility to teach.

Operating table

Interview exerciseWhat it revealsStrong signalWeak signal
Critique a generated specProduct and system judgmentNames missing user, eval, risk, and ownerRewrites wording only
Review generated codeEngineering supervisionChecks behavior, tests, security, maintainabilityAccepts because tests pass
Rank customer repliesRevenue/support judgmentBalances resolution, accuracy, tone, policyOptimizes for politeness alone
Design an evalOperational maturityCreates sample set and failure taxonomySays human review is enough

Artifact example: a judgment interview rubric

judgment_interview_scorecard:
 dimensions:
 evidence_use: 1-5
 assumption_naming: 1-5
 risk_classification: 1-5
 evaluation_design: 1-5
 domain_reasoning: 1-5
 communication_clarity: 1-5
 required_candidate_output:
 - decision
 - reasons
 - missing_context
 - verification_plan
 - rollback_or_escalation_path
 automatic_reject_flags:
 - "treats model output as authority"
 - "cannot name uncertainty"
 - "ignores stated risk boundary"
Candidate reviewing an AI-generated artifact for weak evidence, unsafe promises, missing tests, and unclear ownership
Hiring for observable judgment means watching candidates find missing assumptions, weak evidence, unsafe promises, absent tests, and unclear ownership in polished AI output.

Checklist

  • Test candidates with AI tools allowed, but score supervision rather than raw output.
  • Use flawed artifacts as interview material.
  • Separate current judgment from coachability.
  • Require a verification plan in every work-sample exercise.
  • Train interviewers to reward calibrated doubt, not theatrical confidence.

Takeaway

The best AI-native hiring exercises ask candidates to judge a plausible artifact, not merely create one.

Internal map

For the larger argument, keep this chapter connected to the AI-Native thesis, Building an AI-Native Team, The Judgment Economy, and Human in the Loop Is Not a Plan.

Share