Predicting patient health trajectories with absolute certainty is impossible. Instead, physicians can rely on prior history, current clinical status, and medical heuristics to estimate how likely a patient is to experience an outcome in the future (e.g., hospitalization within the next year). These estimates inform patient-centered treatment decisions and payment plans. In recent decades, algorithms built on the backbone of more precise and abundant patient data have enabled physicians to make their predictions with more accuracy and confidence.
The high stakes environment of medicine demands a rigorous development and validation pipeline for healthcare algorithms. At the time of deployment, an algorithm should have been trained on a range of data generalizing over patient demographics, medical conditions, and healthcare utilization patterns. This step exposes the algorithm to different patient scenarios it could encounter in the future and prepares it to provide a likelihood of risk for new patients with high accuracy.
While algorithm performance, measured by accuracy and other metrics like true positive rate and F1-score, is high at the time of deployment in a hospital system, recent studies have shown that performance can decline over time making the predictions less reliable. This phenomenon is termed performance drift.
What is performance drift?
Performance drift is defined as the gradual or sudden deterioration of algorithm performance after deployment. Changes in practice patterns is a prime driver of performance drift. When healthcare practices change, the composition of the data that are generated changes too. A model that relies on identifying patterns in the data it was trained on can no longer recognize the new scenarios and therefore falters.
For example, the onset of the COVID-19 pandemic led to rapid changes in health utilization and patient case mix. This shift affected the data collected, or not collected, as inputs for risk prediction algorithms.
An Epic Sepsis Model built with artificial intelligence technology and deployed in at least 24 US hospitals fell short at the beginning of quarantine. The algorithm used information from electronic health records to identify early cases of sepsis, a life-threatening infection, among hospitalized patients. Automatic alerts sent for high-risk patients elicit an immediate response by healthcare teams to medically manage the acute events. As COVID-19 cases began to dominate hospital admissions in March 2020, the case mix of patients regularly encountered by hospitals, and thereby the model. Given some similarities between patients who regularly present with sepsis and patients with COVID-19, the proportion of patients with alerts nearly doubled. However, the alerts channeled an inappropriate use of clinical teams and resources to manage non-sepsis cases, burdening healthcare systems at large.
Physicians who incorporate predictive algorithms in their decision-making pipeline can be challenged by drift. As performance declines there is a potential to make misjudgments on patient risk and clinical decisions. Given the cost on patient lives and resource allocation, hospital systems are incentivized to respond to drift by disposing of the model prematurely, as seen with the Epic Sepsis Model.
Instead of getting rid of the affected algorithm, it remains crucial to develop computational and institutional regulations to monitor the performance of algorithms after they are deployed. Once drift is identified, incorporating mitigation strategies could ensure the continued safe use of the algorithm for clinical decisions.
Likhitha Kolla is an MD/PhD Student in the Perelman School of Medicine