2 min read

AI Trust Has a Latency Budget

AI Trust Has a Latency Budget
In regulated life-science AI, explanations must arrive on time, survive drift, and be replayable. Meeting those requirements is an infrastructure problem before it is a modeling one.

Explainable AI (XAI) in life sciences is usually framed as a trust problem.

Clinicians do not trust black boxes. Regulators do not trust undocumented logic. Patients do not trust decisions they cannot understand.

All of that is true, but it obscures the more immediate constraint. What actually slows XAI down in practice is not philosophy or ethics, it's actually infrastructure.

The moment a model is required to produce an explanation that can survive clinical use, regulatory review, and later audit, AI stops being a single inference workload. It becomes two. One pipeline produces predictions. The other produces justification, provenance, and reproducibility.

That second pipeline is not abstract. It consumes real compute, memory, storage, bandwidth, and operational effort.

So yeah, a single prediction is relatively cheap. But a defensible explanation is not.

In drug discovery, explanation typically means perturbation. Feature attribution and counterfactual analysis expand one inference into hundreds or thousands of forward passes.

In medical imaging, explanation often requires gradient-based analysis, attention aggregation, or patch-level attribution across whole-slide images that already push I/O and GPU memory to their limits.

In genomics, explanations are inseparable from preprocessing choices, which means attributions often have to be regenerated across normalization variants simply to confirm that the explanation is stable. Each of these steps multiplies the cost of what initially looks like a straightforward prediction.

Once these systems move out of research settings, the costs begin to accumulate in more structural ways. Models are updated, data distributions drift, pipelines change, and deployments vary across clinical sites. As soon as governance and compliance enter the picture, organizations are required to demonstrate that an explanation shown to a clinician at a given point in time can be reproduced later, under audit.

That requirement expands the scope of the system to include versioning, lineage, retention, and replay. Explanations stop functioning as transient visual aids and instead become formal records, subject to the same expectations around storage, security, auditability, and reconstruction as any other regulated clinical artifact.

This is where the infrastructure tax appears.

XAI workloads tend to be uneven and operationally awkward. The base model might run efficiently on a shared GPU pool, but explanation generation often cannot. Perturbation-based methods inflate batch sizes. Saliency methods depend on intermediate activations that were never designed to be persisted. Counterfactual methods introduce iterative search loops that break clean inference assumptions.

To keep explanations inside clinical time windows, organizations either overprovision compute or push explanation generation offline and accept delays. Neither option is inexpensive.

The thing is, many life-science organizations encounter this late. They plan capacity for inference but not for explanation replay, drift validation, or audit reconstruction. Explainability is treated as a software feature rather than as a sustained operational load that grows over time. As usage expands, explanation costs rise rather than amortize.

This explains why so much XAI literature translates poorly into deployment. The methods are often sound. The explanations look plausible. The underlying systems struggle to carry the additional load reliably.

Framed this way, the core design question shifts. It is no longer whether a model can be explained, but where explanation lives in the architecture and how much ongoing capacity the organization is willing to allocate to it.

Decisions about caching versus regeneration, tiered explanation fidelity, separation of real-time and audit paths, and retention policies are architectural choices with direct cost implications.

Explainability might add transparency but it also adds sustained workload.