Trust is upstream.
Calibrated Authority and the Organizational Layer of Human–AI Decision-Making
A nurse practitioner is looking at a screen. The model has flagged a patient as low-risk; her instinct says otherwise. She has about ninety seconds before the next room. Does she rely on the AI, or override it?
This is the question the field of human–AI decision-making has spent two decades studying, and studying well. John D. Lee and Katrina See named the real target back in 2004 — not trust, but appropriate reliance: "designing for" the fit between what a system can actually do and what a person believes it can do. Raja Parasuraman charted the failure modes that follow when that fit breaks — automation misuse, disuse, abuse. The modern wave sharpened it. Ming Yin has modeled over-trust and under-trust and how interface design nudges a person toward appropriate reliance. Ewart de Visser has studied how trust gets calibrated dynamically inside human–autonomy teams. Q. Vera Liao complicated the easy answer, showing that explanations do not automatically improve decisions — people can be persuaded by one that happens to be wrong (a strand that runs back to David Gunning's DARPA XAI program, which first put explainability on the agenda as a route to earned trust). And a Microsoft Research line of work — Gagan Bansal, Ece Kamar, Saleema Amershi, Daniel Weld — reframed the goal as complementary performance: the human–AI pair outperforming either alone, which happens only when reliance is calibrated.
The throughline of this work is one of the most important ideas in applied AI: the goal is not to maximize trust. It is to calibrate it — to rely on the system when it is right and to override it when it is not. Transparency alone won't get you there. A model that shows its work but doesn't show its competence on this case invites confident mistakes.
This science is right. It is also only studying the second half of the problem.
The dyad and what comes before it
Look again at the nurse. The research treats her as the decision-maker — one human, one AI output, one choice, free to rely or override. That framing is precise, and it is how you run a clean study. But step back into the actual hospital and the framing quietly dissolves. By the time that recommendation reaches her screen, a series of decisions has already been made — usually by no one in particular:
- Who is even allowed to act on this model's output, and who must escalate?
- What is the model formally permitted to decide, and where is it advisory only?
- When she overrides it and is wrong, who is accountable — her, the protocol, the vendor?
- What does the workflow let her do in ninety seconds — actually weigh the AI, or just click through it?
None of these are decisions she makes in the moment. They were made upstream, in the design of the system she now lives inside. And they determine whether her individual calibration even matters. You can give a clinician a perfectly calibrated trust signal and a beautiful explanation, and if the protocol has already pre-committed her to the model's call, the explanation is theater.
This is the part the dyad-level science brackets out — and it is the part that decides outcomes at scale. Trust in an organization is not primarily an event that happens at the screen. It is a structure that was built before the model ever went live. Trust is upstream.
Calibrated Authority
So here is the idea that needs a name. The field has a precise term for the individual version — appropriate reliance — and no term at all for the organizational one. That absence isn't an oversight. It's where the expensive failures have been hiding.
Appropriate reliance asks: should this person trust this output right now? Calibrated Authority asks the question one level up: should this role be permitted to rely on this model for this class of decision at all — and what happens when it's wrong? The first is a property of a person at a moment. The second is a property of an organization by design.
The two are not in competition. Calibrated Authority is the container that makes appropriate reliance possible. Get the container wrong and no amount of individual calibration saves you: you've either authorized reliance the model hasn't earned (automation bias, now institutionalized) or you've withheld it where the model is genuinely better than the human (under-reliance, now mandated). Liao's finding — that transparency is insufficient — doesn't soften at organizational scale. It compounds. An explanation no one is structurally permitted to act on is worse than no explanation, because it manufactures the appearance of human judgment over a decision that was already made.
Designing trust before the model goes live
If trust is upstream, then trust is a design problem, not a training problem. Organizations rarely fail because the model is weak. They fail because no one designed how authority, accountability, and intelligence would flow once the model entered the decision — so the model's de facto authority gets set by latency, by interface defaults, by who's too busy to override. That's not calibration. That's drift.
Calibrating authority means doing on purpose, before launch, what most organizations discover by accident, after harm:
- Map the decisions, not just the model. Which classes of decision is this system touching, and what is the cost of being wrong in each?
- Set authority to competence. Where the model is reliably better, design for reliance and make override the exception. Where it's brittle or uncertain, design for human primacy and make reliance the exception. Match the grant of authority to the evidence.
- Make accountability legible in advance. Every authority grant carries a named owner for its failures. If no one owns the override, no one will make it.
- Build the moment to allow the judgment. Ninety seconds with a click-through is a design choice. So is ninety seconds with a real prompt to weigh the call. Calibration that the workflow doesn't have room for doesn't happen.
This is the organizational layer the dyad research implies but doesn't reach — and it's where the consequential domains actually live: the hospital, the lender, the hiring pipeline, the agency, the defense system. In every one, the question that determines the outcome is not only "did this person trust the AI correctly," but "did the organization architect trust correctly, before this person ever sat down."
Why this is where the category is going
There's a longer arc here worth naming. The dyad — one human, deliberating over one AI output — is, in part, a transitional artifact of this moment in AI, when a human still sits in every loop. As systems take on more of the decision and the human moves from in-the-loop to on-the-loop to out-of-it for whole classes of low-consequence calls, the individual deliberation shrinks. What does not shrink — what grows — is the organizational question: who authorized this, calibrated to what, accountable to whom.
Ten and twenty years out, the organizational layer isn't an adjacent niche to trust calibration. It's where trust calibration lives, once the moment-of-decision has been automated away for everything but the hardest cases. The researchers above mapped the moment with rigor. The next decade is a question about the architecture around it — and that architecture is something organizations have to design on purpose, upstream, or have decided for them by drift.
The work is not to maximize trust in AI.
It never was. It's to calibrate it.
It's to calibrate it — at the screen, and in the structure that decides what the screen is even allowed to do.
Trust is a design problem. Let's design it out loud.
Every week, Signals from the Curve tracks the parts of AI work that compound — including the organizational layer most strategies never reach. Calibrated Authority is one note in that arc.
Wisdom that outlasts the algorithm, every Wednesday.
"Trust in an organization is not primarily an event that happens at the screen. It is a structure that was built before the model ever went live."
Track the arc as it lands — subscribe to Signals from the Curve.
References
Ordered to lead with the foundational appropriate-reliance lineage, then the modern cluster.
- Lee, J.D., & See, K.A. (2004). "Trust in Automation: Designing for Appropriate Reliance." Human Factors 46(1):50–80. — Names the essay's actual thesis: calibration, not trust-maximization.
- Parasuraman, R., & Riley, V. (1997). "Humans and Automation: Use, Misuse, Disuse, Abuse." Human Factors 39(2):230–253. — The failure-mode taxonomy when reliance is miscalibrated.
- de Visser, E.J., et al. (2020). "Towards a Theory of Longitudinal Trust Calibration in Human–Robot Teams." Int. J. of Social Robotics 12:459–478. — Dynamic, mutually-adaptive trust calibration in human–autonomy teams.
- Ming Yin et al. (2023). "Modeling Human Trust and Reliance in AI-Assisted Decision Making," AAAI '23; and (2024) "'Are You Really Sure?' … Self-Confidence Calibration," CHI '24. — Over/under-reliance failure modes and appropriate reliance.
- Zhang, Y., Liao, Q.V., & Bellamy, R.K.E. (2020). "Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making." FAT* '20. — Source of "transparency alone is insufficient / confidence calibration matters."
- Chen, V., Liao, Q.V., Vaughan, J.W., & Bansal, G. (2023). "Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations." Proc. ACM HCI (CSCW2), Art. 370. — Explanations can increase overreliance when the AI is wrong.
- Bansal, G., Nushi, B., Kamar, E., Weld, D.S.; & Amershi, S., et al. (2019/2021). "Guidelines for Human-AI Interaction" (CHI '19) and "Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance" (CHI '21). — The Microsoft Research co-citation cluster; complementary team performance as the real objective.
Adjacent strand (cited in-text, intentionally not a lead co-citation): Gunning, D., & Aha, D. (2019) / Gunning et al. (2021) — DARPA XAI. Explainability is part of the domain this essay names, kept as a one-line lineage nod.
Reitz, C. H. (2026). Trust Is Upstream: Calibrated Authority and the Organizational Layer of Human–AI Decision-Making. chrishuberreitz.com/frameworks/calibrated-authority