Crossing the chasm from model performance to clinical impact: the need to improve implementation and evaluation of AI
Crossing the chasm from model performance to clinical impact: the need to improve implementation and evaluation of AI
J. Marwaha,J. Kvedar
TLDR
There was no clinical benefit of using AI prediction tools compared to the standard of care in nearly 40% of studies and the science of building accurate predictive models progresses, but the ability to translate these advancements into real-world clinical utility remains comparatively limited.
Abstract
Artificial intelligence (AI) has been the subject of considerable interest for many years for its potential to improve clinical care —yet its actual impact on patient outcomes when deployed in clinical settings remains largely unknown. In a recent systematic review by Zhou et al., the authors surprisingly show that its impact so far has been quite limited. They reviewed 65 randomized controlled trials (RCTs) evaluating AI-based clinical interventions and found that there was no clinical benefit of using AI prediction tools compared to the standard of care in nearly 40% of studies. Among a subset of trials that the authors identified as having a low risk of bias, the clinical benefit of using deep learning (DL) predictive models over traditional statistical (TS) risk calculators was only minimal, and there was no benefit in using machine learning (ML) models over TS tools. Somewhat counterintuitively, most of the AI tools in these trials exhibited an excellent area under the receiver operating characteristic (AUROC; a common performance metric for predictive models) during development (median AUROC 0.81, IQR 0.75–0.90) and validation (median AUROC 0.83, IQR 0.79–0.97): a humbling reminder that robust predictive utility does not guarantee clinical impact at the bedside. As the science of building accurate predictive models progresses, our ability to translate these advancements into real-world clinical utility remains comparatively limited. How can we bridge this gap between AUROCs and clinical benefit?
