Trust is upstream.

Calibrated Authority and the Organizational Layer of Human–AI Decision-Making

By Chris Huber Reitz

⚑ Chris — replace this scene with a real story (TODO-A) The opening below is a placeholder device. Swap in one real moment from your 15,000+ trainees where org structure, not the model, decided the outcome. Delete this box once your anecdote is in.

A nurse practitioner is looking at a screen. The model has flagged a patient as low-risk; her instinct says otherwise. She has about ninety seconds before the next room. Does she rely on the AI, or override it?

This is the question the field of human–AI decision-making has spent two decades studying, and studying well. John D. Lee and Katrina See named the real target back in 2004 — not trust, but appropriate reliance: "designing for" the fit between what a system can actually do and what a person believes it can do. Raja Parasuraman charted the failure modes that follow when that fit breaks — automation misuse, disuse, abuse. The modern wave sharpened it. Ming Yin has modeled over-trust and under-trust and how interface design nudges a person toward appropriate reliance. Ewart de Visser has studied how trust gets calibrated dynamically inside human–autonomy teams. Q. Vera Liao complicated the easy answer, showing that explanations do not automatically improve decisions — people can be persuaded by one that happens to be wrong (a strand that runs back to David Gunning's DARPA XAI program, which first put explainability on the agenda as a route to earned trust). And a Microsoft Research line of work — Gagan Bansal, Ece Kamar, Saleema Amershi, Daniel Weld — reframed the goal as complementary performance: the human–AI pair outperforming either alone, which happens only when reliance is calibrated.

The throughline of this work is one of the most important ideas in applied AI: the goal is not to maximize trust. It is to calibrate it — to rely on the system when it is right and to override it when it is not. Transparency alone won't get you there. A model that shows its work but doesn't show its competence on this case invites confident mistakes.

This science is right. It is also only studying the second half of the problem.

The dyad and what comes before it

Look again at the nurse. The research treats her as the decision-maker — one human, one AI output, one choice, free to rely or override. That framing is precise, and it is how you run a clean study. But step back into the actual hospital and the framing quietly dissolves. By the time that recommendation reaches her screen, a series of decisions has already been made — usually by no one in particular:

None of these are decisions she makes in the moment. They were made upstream, in the design of the system she now lives inside. And they determine whether her individual calibration even matters. You can give a clinician a perfectly calibrated trust signal and a beautiful explanation, and if the protocol has already pre-committed her to the model's call, the explanation is theater.

This is the part the dyad-level science brackets out — and it is the part that decides outcomes at scale. Trust in an organization is not primarily an event that happens at the screen. It is a structure that was built before the model ever went live. Trust is upstream.

Calibrated Authority

So here is the idea that needs a name. The field has a precise term for the individual version — appropriate reliance — and no term at all for the organizational one. That absence isn't an oversight. It's where the expensive failures have been hiding.

The definition Calibrated Authority is the organizational twin of appropriate reliance: the deliberate design of who is authorized to rely on which AI, for which decisions, at what level of consequence — calibrated to the system's actual competence and uncertainty, and decided before deployment rather than discovered after an incident.

Appropriate reliance asks: should this person trust this output right now? Calibrated Authority asks the question one level up: should this role be permitted to rely on this model for this class of decision at all — and what happens when it's wrong? The first is a property of a person at a moment. The second is a property of an organization by design.

The two are not in competition. Calibrated Authority is the container that makes appropriate reliance possible. Get the container wrong and no amount of individual calibration saves you: you've either authorized reliance the model hasn't earned (automation bias, now institutionalized) or you've withheld it where the model is genuinely better than the human (under-reliance, now mandated). Liao's finding — that transparency is insufficient — doesn't soften at organizational scale. It compounds. An explanation no one is structurally permitted to act on is worse than no explanation, because it manufactures the appearance of human judgment over a decision that was already made.

Designing trust before the model goes live

If trust is upstream, then trust is a design problem, not a training problem. Organizations rarely fail because the model is weak. They fail because no one designed how authority, accountability, and intelligence would flow once the model entered the decision — so the model's de facto authority gets set by latency, by interface defaults, by who's too busy to override. That's not calibration. That's drift.

Calibrating authority means doing on purpose, before launch, what most organizations discover by accident, after harm:

This is the organizational layer the dyad research implies but doesn't reach — and it's where the consequential domains actually live: the hospital, the lender, the hiring pipeline, the agency, the defense system. In every one, the question that determines the outcome is not only "did this person trust the AI correctly," but "did the organization architect trust correctly, before this person ever sat down."

Why this is where the category is going

There's a longer arc here worth naming. The dyad — one human, deliberating over one AI output — is, in part, a transitional artifact of this moment in AI, when a human still sits in every loop. As systems take on more of the decision and the human moves from in-the-loop to on-the-loop to out-of-it for whole classes of low-consequence calls, the individual deliberation shrinks. What does not shrink — what grows — is the organizational question: who authorized this, calibrated to what, accountable to whom.

Ten and twenty years out, the organizational layer isn't an adjacent niche to trust calibration. It's where trust calibration lives, once the moment-of-decision has been automated away for everything but the hardest cases. The researchers above mapped the moment with rigor. The next decade is a question about the architecture around it — and that architecture is something organizations have to design on purpose, upstream, or have decided for them by drift.

The work is not to maximize trust in AI.
It never was. It's to calibrate it.

It's to calibrate it — at the screen, and in the structure that decides what the screen is even allowed to do.

⚑ Chris — strengthen the credential line (TODO-B) The bio below is the generic source version. Add the real signal — built the AI course before ChatGPT existed; the Fortune-50 exec seat. No PhD / "PhD candidate" language. Delete this box once the credential line is in.
Chris Huber Reitz teaches AI Strategy at Columbia University and is Chief of AI & Strategy at Essential Innovations, where he has trained more than 15,000 people on AI and helped organizations design how trust, authority, and accountability flow once these systems enter consequential decisions.

Trust is a design problem. Let's design it out loud.

Every week, Signals from the Curve tracks the parts of AI work that compound — including the organizational layer most strategies never reach. Calibrated Authority is one note in that arc.

Wisdom that outlasts the algorithm, every Wednesday.

"Trust in an organization is not primarily an event that happens at the screen. It is a structure that was built before the model ever went live."

Track the arc as it lands — subscribe to Signals from the Curve.

References

Ordered to lead with the foundational appropriate-reliance lineage, then the modern cluster.

  1. Lee, J.D., & See, K.A. (2004). "Trust in Automation: Designing for Appropriate Reliance." Human Factors 46(1):50–80. — Names the essay's actual thesis: calibration, not trust-maximization.
  2. Parasuraman, R., & Riley, V. (1997). "Humans and Automation: Use, Misuse, Disuse, Abuse." Human Factors 39(2):230–253. — The failure-mode taxonomy when reliance is miscalibrated.
  3. de Visser, E.J., et al. (2020). "Towards a Theory of Longitudinal Trust Calibration in Human–Robot Teams." Int. J. of Social Robotics 12:459–478. — Dynamic, mutually-adaptive trust calibration in human–autonomy teams.
  4. Ming Yin et al. (2023). "Modeling Human Trust and Reliance in AI-Assisted Decision Making," AAAI '23; and (2024) "'Are You Really Sure?' … Self-Confidence Calibration," CHI '24. — Over/under-reliance failure modes and appropriate reliance.
  5. Zhang, Y., Liao, Q.V., & Bellamy, R.K.E. (2020). "Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making." FAT* '20. — Source of "transparency alone is insufficient / confidence calibration matters."
  6. Chen, V., Liao, Q.V., Vaughan, J.W., & Bansal, G. (2023). "Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations." Proc. ACM HCI (CSCW2), Art. 370. — Explanations can increase overreliance when the AI is wrong.
  7. Bansal, G., Nushi, B., Kamar, E., Weld, D.S.; & Amershi, S., et al. (2019/2021). "Guidelines for Human-AI Interaction" (CHI '19) and "Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance" (CHI '21). — The Microsoft Research co-citation cluster; complementary team performance as the real objective.

Adjacent strand (cited in-text, intentionally not a lead co-citation): Gunning, D., & Aha, D. (2019) / Gunning et al. (2021) — DARPA XAI. Explainability is part of the domain this essay names, kept as a one-line lineage nod.

How to cite this essay:
Reitz, C. H. (2026). Trust Is Upstream: Calibrated Authority and the Organizational Layer of Human–AI Decision-Making. chrishuberreitz.com/frameworks/calibrated-authority