Locked Doors: Residency, Permissions, and Abuse
> **Working claim:** A router is a thing that *sends data somewhere and triggers actions*, which makes it a security and compliance surface, not just a cost optimization.
Key Takeaways
- Locked Doors: Residency, Permissions, and Abuse is a chapter about model routing and inference control planes, not a generic AI adoption note.
- The operating rule is to send each request to the cheapest path that still meets quality, latency, residency, and risk requirements.
- The failure mode to watch is polished output without evidence, owner, cost line, or rollback path.
- The useful next step is an artifact a future teammate can replay without folklore.
Model routing works when each request goes to the cheapest path that still meets quality, latency, residency, and risk requirements.
Working claim: A router is a thing that sends data somewhere and triggers actions, which makes it a security and compliance surface, not just a cost optimization. It can send a regulated document to a forbidden provider, route a high-risk question to a weak model, bypass a guardrail through a failover path, or be manipulated by an adversary into forcing every request down the most expensive lane to drain a budget. Governance is the set of locked doors that keep routing from becoming the system's softest attack surface.
The router as an attack surface
It is easy to think of the router as a performance component, it picks models to save money and improve quality. But look at what it does: it takes a request (which may contain sensitive data), decides which provider receives that data, decides which model's capabilities (including tool access) are applied, and decides how much to spend. Each of those decisions is a security boundary. A router that gets them wrong does not just overspend, it can leak regulated data to a non-compliant provider, apply a model that should never have seen a high-risk request, route around a safety control, or burn a budget an attacker chose to burn.
The NIST AI Risk Management Framework frames AI governance around mapping, measuring, and managing risks across the system, and the router is squarely inside that scope: it is an automated decision-maker handling data and triggering actions, which is exactly what governance frameworks exist to control. The locked doors in this chapter are the routing-specific controls that implement that governance.
Door 1: Data residency and provider allowlists
The first locked door is where the data is allowed to go. Many requests carry data subject to residency or handling constraints: an EU customer's data that must stay in the EU, regulated health or financial data that may only go to providers under an appropriate agreement, a tenant whose contract forbids certain providers. The router's provider choice is the data-residency decision, and it must be a hard guarantee (Chapter 8's deterministic rules), never a probabilistic preference.
The mechanism is a provider allowlist per request, derived from the data's constraints, applied before any model is considered eligible. A request tagged EU-residency has an allowlist of EU-region providers; a request with regulated PII has an allowlist of providers under a data-processing agreement. Crucially, this allowlist survives failover (Chapter 10): a failover substitute must come from the same allowlist, because availability never overrides residency. The most dangerous residency bug is a failover path that, under provider outage, reaches outside the allowlist to "stay up", turning an availability event into a compliance breach.
# Provider allowlist as a HARD gate, applied before eligibility and surviving failover.
def allowed_providers(request, fleet):
allow = set(fleet.all_providers())
if request.data_residency == "EU":
allow &= fleet.providers_in_region("EU") # residency narrows the set
if request.contains_regulated_pii:
allow &= fleet.providers_with_dpa() # only providers under agreement
if request.tenant.provider_denylist:
allow -= set(request.tenant.provider_denylist) # contractual exclusions
return allow
def eligible_models(request, fleet):
providers = allowed_providers(request, fleet) # DOOR 1: where data may go
risk = assess_risk(request) # DOOR 2: capability floor
return [m for m in fleet.models
if m.provider in providers # residency-gated
and m.tier >= RISK_FLOOR_TIER[risk] # risk-gated
and m.capabilities >= request.required_caps] # capability-gated
Door 2: Model capability manifests and permissions
The second locked door is what the chosen model is allowed to do. Models differ not just in quality but in capabilities and permissions: which can use which tools, which may take side-effecting actions, which are approved for high-risk domains, which have been safety-reviewed for a given use. A capability manifest declares, per model, what it is permitted to do, and the router enforces it: a request that needs a side-effecting tool may only route to models permitted to use that tool; a high-risk medical request may only route to models approved for medical use.
This matters because routing can otherwise bypass a control. If your safety review approved Model A for handling medical questions but the router, optimizing cost, sends a medical question to the cheaper unreviewed Model B, the router has silently routed around the safety review. The capability manifest makes the approval part of eligibility, so the cost optimizer can never select an unapproved model no matter how cheap it is. This is the routing analog of least privilege: a model gets the minimum capabilities its request requires, and the router cannot grant more by accident.
# capability-manifest.yaml - what each model is PERMITTED to do. Enforced as eligibility.
models:
small-hosted:
tools_allowed: [search, calculator] # no side-effecting tools
domains_approved: [general, support] # NOT medical/legal/financial
side_effects: false
safety_reviewed: true
flagship-v4:
tools_allowed: [search, calculator, code_exec, db_read] # db_read, not db_write
domains_approved: [general, support, legal, financial]
side_effects: false # still no write actions without human
safety_reviewed: true
agent-executor:
tools_allowed: [search, calculator, db_read, db_write, send_email]
domains_approved: [general]
side_effects: true # CAN take actions -> tighter gating
requires_human_approval_for: [db_write, send_email, payment]
safety_reviewed: true
The requires_human_approval_for field on the action-capable model is the tool-use gate: even an eligible, capable model does not get to take a high-blast-radius action (write a record, send an email, move money) without a human in the loop. Routing to an action-capable model is a risk decision (Chapter 5's critical tier), and the manifest encodes that some actions are never auto-executed regardless of which model is chosen.
Door 3: The audit trail
The third locked door is not a gate but a record, the audit trail that makes every routing decision reconstructable. Governance frameworks require that automated decisions be explainable and accountable, and for a router that means: for any request, you can answer which model handled it, which provider received the data, why that route was chosen, what policy version decided it, and whether any control fired. This is the Chapter 1 decision log extended with the governance fields, and it is non-negotiable for regulated systems.
-- Routing audit log: governance fields on top of the decision log.
CREATE TABLE routing_audit (
request_id TEXT PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
tenant TEXT NOT NULL,
data_residency TEXT, -- the constraint that applied
contained_pii BOOLEAN NOT NULL,
allowed_providers TEXT[] NOT NULL, -- the allowlist that gated this request
chosen_provider TEXT NOT NULL, -- where the data actually went
chosen_model TEXT NOT NULL,
risk_tier TEXT NOT NULL,
policy_version TEXT NOT NULL, -- which policy decided
controls_fired TEXT[], -- e.g. {'pii_floor','human_review'}
human_approval TEXT, -- approver id if a gate required one
budget_state JSONB -- budget remaining at decision time
);
-- A regulator's question "did EU data ever leave the EU?" is now a query:
-- SELECT * FROM routing_audit WHERE data_residency='EU'
-- AND chosen_provider NOT IN (SELECT provider FROM eu_providers);
-- This MUST return zero rows. If it can't, the system is not auditable.
The query in the comment is the test of whether your audit trail is real: a governance question must be answerable as a query against the log, returning the rows that would be violations. If you cannot write that query, because the log does not record the provider, or the residency constraint, or the policy version, then "we route compliantly" is a claim you cannot defend, and the NIST AI RMF's accountability requirement is unmet.
Door 4: Abuse and the denial-of-wallet attack
The fourth locked door defends against an attacker who understands your router better than you wish. OWASP's LLM Top 10 names unbounded consumption as a risk class, and routed systems have a specific version of it: the denial-of-wallet or forced-expensive-route attack. An adversary who learns that certain inputs force escalation to the flagship, long inputs, inputs that trip the verifier, inputs that look high-risk, inputs that exhaust the cheap provider's rate limit and trigger failover (Chapter 10), can craft traffic that drives every request down the most expensive lane, multiplying your bill without any legitimate use. The router's own intelligence (escalate when uncertain, fail over when unavailable) becomes the attack vector.
The defenses are budget guards and abuse detection layered on top of routing:
# Budget guard + abuse detection: routing must respect a cost ceiling, per tenant.
def route_with_budget_guard(request, fleet):
tenant_budget = budget.remaining(request.tenant) # rolling window budget
if tenant_budget <= 0:
return shed_or_queue(request) # over budget -> shed, don't spend
# Abuse signal: is this tenant forcing expensive routes abnormally?
recent = stats.recent_route_distribution(request.tenant)
if recent.escalation_rate > ABUSE_THRESHOLD and recent.volume > MIN_VOLUME:
flag_for_review(request.tenant, reason="forced_expensive_routing")
# Defensive posture: cap this tenant to cheaper lanes pending review.
return route_capped(request, max_tier="mid-hosted")
decision = route(request, fleet)
# Hard ceiling: a single decision may not exceed a per-request cost cap.
if estimate_cost(request, decision) > request.tenant.per_request_cap:
return route_capped(request, max_tier="mid-hosted")
return decision
The principles: every tenant has a budget and a per-request cost cap, and routing that would breach either sheds or downgrades rather than spending, availability and quality are goals, not unconditional imperatives, and a routing system with no cost ceiling is a denial-of-wallet vulnerability with extra steps. Abnormal escalation rates from a tenant are an abuse signal, defended by capping that tenant to cheaper lanes pending review. FrugalGPT's whole premise (most requests are cheap) is also a security assumption: it holds for honest traffic and can be inverted by an adversary, so the budget guard is what keeps the cost distribution from being weaponized.
Door 5: Prompt injection and routing manipulation
The last door connects to the wider adversarial-LLM literature. Prompt injection is usually discussed as manipulating a model's output, but in a routed system it can manipulate the routing: a request crafted to look low-risk to the risk classifier (to get routed to a weak, less-guarded model), or to look high-risk (to force escalation, a denial-of-wallet route), or content embedded in a document that tries to influence the intent classifier. The defense is the same defense the OWASP cheat sheet prescribes for injection generally, do not trust untrusted input to drive privileged decisions, applied to routing: the risk and intent classifiers should be robust to adversarial input, hard rules (residency, capability floors) should not be overridable by request content, and the most security-sensitive routing decisions should be made on trusted metadata (authenticated tenant, verified data tags) rather than on the model's interpretation of the request text. A router that lets the content of a request lower its own risk tier has handed the attacker the keys. Risk and residency must be determined by who is asking and what data is involved, facts the system authenticates, not by what the request says about itself.
Chapter summary
A router decides where data goes, which capabilities apply, and how much to spend, which makes it a security and compliance surface squarely inside the NIST AI RMF's scope, not just a cost optimizer. Five locked doors implement its governance. *Door 1, residency and provider allowlists: * the router's provider choice is the data-residency decision, so it must be a hard deterministic gate applied before eligibility and surviving failover, the dangerous bug is a failover path reaching outside the allowlist to stay up, turning an outage into a breach. *Door 2, capability manifests: * models differ in permitted tools, approved domains, and side-effect rights, and the manifest makes safety approvals part of eligibility so the cost optimizer can never route around a review by choosing a cheaper unreviewed model; high-blast-radius actions require human approval regardless of model (least privilege for routing). *Door 3, the audit trail: * every decision must be reconstructable, provider, model, residency constraint, policy version, controls fired, and the test is whether a regulator's question ("did EU data ever leave the EU?") is answerable as a query that returns zero violation rows. *Door 4, budget guards: * OWASP's unbounded-consumption risk takes the routing-specific form of denial-of-wallet, an adversary crafting traffic that forces every request down the flagship lane (via escalation triggers or forced failover), defended by per-tenant budgets, per-request cost caps that shed or downgrade rather than spend, and abuse detection on abnormal escalation rates. *Door 5, injection and routing manipulation: * prompt injection can target the routing (look low-risk to reach a weak model, or high-risk to force expensive escalation), so the most sensitive routing decisions must be made on authenticated metadata, who is asking, what data is involved, never on what the request text says about itself, because a router that lets request content lower its own risk tier has handed the attacker the keys.
Internal map
For the larger argument, keep this chapter connected to Model Routing, The Economics of Inference, the smaller-model margin argument, and A Field Guide to Evals.
