The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data

被引:54
|
作者
Wood, Angela M. [1 ]
Royston, Patrick [2 ]
White, Ian R. [3 ]
机构
[1] Univ Cambridge, Dept Publ Hlth & Primary Care, Strangeways Res Lab, Cambridge CB1 8RN, England
[2] UCL, MRC, Clin Trials Unit, London WC2B 6NH, England
[3] Cambridge Inst Publ Hlth, MRC, Biostat Unit, Cambridge CB2 0SR, England
关键词
Measures of model performance; Missing data; Model validation; Multiple imputation; Prediction models; Rubin's rules; DISEASE RISK SCORE; EXTERNAL VALIDATION; PROGNOSTIC MODELS; MISSING-DATA; IMPUTATION; CANCER; VALUES; QRISK;
D O I
10.1002/bimj.201400004
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply imputed covariates, and either the set of imputation-specific prediction model coefficients or the pooled prediction model coefficients. We further describe approaches for using the predictions to assess model performance. We distinguish between ideal model performance and pragmatic model performance, where the former refers to the model's performance in an ideal clinical setting where all individuals have fully observed predictors and the latter refers to the model's performance in a real-world clinical setting where some individuals have missing predictors. The approaches are compared through an extensive simulation study based on the UK700 trial. We determine that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance. Alternative methods to evaluate pragmatic model performance are required and we propose constructing predictions either from a second set of covariate imputations which make no use of observed outcomes, or from a set of partial prediction models constructed for each potential observed pattern of covariate. Pragmatic model performance is generally lower than ideal model performance. We focus on model performance within the derivation data, but describe how to extend all the methods to a validation dataset.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [1] Obtaining Predictions from Models Fit to Multiply Imputed Data
    Miles, Andrew
    SOCIOLOGICAL METHODS & RESEARCH, 2016, 45 (01) : 175 - 185
  • [2] The Fay-Herriot model for multiply imputed data with an application to regional wealth estimation in Germany
    Kreutzmann, Ann-Kristin
    Marek, Philipp
    Runge, Marina
    Salvati, Nicola
    Schmid, Timo
    JOURNAL OF APPLIED STATISTICS, 2022, 49 (13) : 3278 - 3299
  • [3] Model specification and bootstrapping for multiply imputed data: An application to count models for the frequency of alcohol use
    Comulada, W. Scott
    Stata Journal, 2015, 15 (03): : 833 - 844
  • [4] MULTIPLY ROBUST BOOTSTRAP VARIANCE ESTIMATION IN THE PRESENCE OF SINGLY IMPUTED SURVEY DATA
    Chen, Sixia
    Haziza, David
    Mashreghi, Zeinab
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2021, 9 (04) : 810 - 832
  • [5] A comparison of model selection methods for prediction in the presence of multiply imputed data
    Le Thi Phuong Thao
    Geskus, Ronald
    BIOMETRICAL JOURNAL, 2019, 61 (02) : 343 - 356
  • [6] Model selection of generalized estimating equations with multiply imputed longitudinal data
    Shen, Chung-Wei
    Chen, Yi-Hau
    BIOMETRICAL JOURNAL, 2013, 55 (06) : 899 - 911
  • [7] Addressing health disparities using multiply imputed injury surveillance data
    Liu, Yang
    Wolkin, Amy F.
    Kresnow, Marcie-jo
    Schroeder, Thomas
    INTERNATIONAL JOURNAL FOR EQUITY IN HEALTH, 2023, 22 (01)
  • [8] Using AIC in multiple linear regression framework with multiply imputed data
    Chaurasia, Ashok
    Harel, Ofer
    HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY, 2012, 12 (2-3) : 219 - 233
  • [9] Using AIC in multiple linear regression framework with multiply imputed data
    Ashok Chaurasia
    Ofer Harel
    Health Services and Outcomes Research Methodology, 2012, 12 (2-3) : 219 - 233
  • [10] Addressing health disparities using multiply imputed injury surveillance data
    Yang Liu
    Amy F. Wolkin
    Marcie-jo Kresnow
    Thomas Schroeder
    International Journal for Equity in Health, 22