2025 Colloquia
2025 Colloquia
Accommodating Population Differences in Risk Prediction Model Validation
Ruth Pfeiffer, PhD
National Cancer Institute, NIH
Statistical risk prediction models have broad public health and clinical applications. I first give an overview over various definitions of “risk” and then briefly discuss building models that predict absolute risk (also called ‘cumulative incidence’ or ‘crude risk’), the probability that an individual who is free of cancer at an initial age, a, will develop that cancer in the subsequent interval (a, t]. Before a model can be recommended for applications, its performance needs to be assessed, ideally in independent data. However, several differences between the populations that gave rise to the training and the validation data can lead to seemingly poor performance of a risk model. I formalize the notions of “similarity” of the training and validation data and define “reproducibility” and “transportability”. I address the impact of different predictor distributions and differences in verifying the outcome on model calibration, accuracy and discrimination. When individual level data from both the training and validation data sets are available, I propose and study weighted versions of the validation metrics that adjust for differences in the predictor distributions and in outcome verification to provide a more comprehensive assessment of model performance. I give conditions on the model and the training and validation populations that ensure a model's reproducibility or transportability and show how to check them. The concepts are illustrated by developing and validating a prostate cancer risk model using data from two large North American prostate cancer prevention trials, the Selenium and Vitamin E Cancer Prevention Trial (SELECT) and Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening trials.
Thursday, May 1, 2025
Some New Advances in Similarity-Based Predictive Modeling
Joel Dubin, PhD
University of Waterloo
Earlier work has shown that similarity-based predictive models can improve upon predictive performance, as compared to using the entire training data to help build models, particular regarding model discrimination for binary responses. My collaborators and I have some updated results to share, regarding similarity-based modeling for joint consideration of model calibration and discrimination, as well as for dynamic prediction models. Properties of our methods will be investigated in comprehensive simulation studies, and we will demonstrate the methods through separate analyses of a publicly available ICU database. This is joint work with Minzee Kim at the University of Waterloo and Tatiana Krikella at York University.
Thursday, April 10, 2025
Faculty Candidate Seminars
Candidate talks are open only to department members. Please check emails for details.