What is the problem

Identifying patients who will suffer adverse outcomes is extremely important, but clinicians are notoriously poor at prediction and prognosis. Advances in statistical modelling and machine learning can improve real-time prediction of outcomes. But how do such algorithms stand up in real-time clinical care, and what novel data sources can we use to augment them?

What are we doing

We have performed the first prospectively validations of real-time machine learning algorithms to predict mortality in patients with cancer using electronic health record data. We also performed some of the first experiments comparing machine learning and clinician predictions. Finally, we have pioneered novel "two-stage" methodologies that account for cancer-specific heterogeneity and integrate important data such as patient-reported outcomes into predictions.


We are establishing pipelines for rigorously training and prospectively validating machine learning algorithms to see if their performance holds up in real-world clinical care. Our mortality prediction algorithm is currently operational at several Penn oncology practices. Our patent-pending two-phase methods to integrate patient-generated health data will offer new directions in algorithm development.