The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data

被引:54
|
作者
Wood, Angela M. [1 ]
Royston, Patrick [2 ]
White, Ian R. [3 ]
机构
[1] Univ Cambridge, Dept Publ Hlth & Primary Care, Strangeways Res Lab, Cambridge CB1 8RN, England
[2] UCL, MRC, Clin Trials Unit, London WC2B 6NH, England
[3] Cambridge Inst Publ Hlth, MRC, Biostat Unit, Cambridge CB2 0SR, England
关键词
Measures of model performance; Missing data; Model validation; Multiple imputation; Prediction models; Rubin's rules; DISEASE RISK SCORE; EXTERNAL VALIDATION; PROGNOSTIC MODELS; MISSING-DATA; IMPUTATION; CANCER; VALUES; QRISK;
D O I
10.1002/bimj.201400004
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply imputed covariates, and either the set of imputation-specific prediction model coefficients or the pooled prediction model coefficients. We further describe approaches for using the predictions to assess model performance. We distinguish between ideal model performance and pragmatic model performance, where the former refers to the model's performance in an ideal clinical setting where all individuals have fully observed predictors and the latter refers to the model's performance in a real-world clinical setting where some individuals have missing predictors. The approaches are compared through an extensive simulation study based on the UK700 trial. We determine that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance. Alternative methods to evaluate pragmatic model performance are required and we propose constructing predictions either from a second set of covariate imputations which make no use of observed outcomes, or from a set of partial prediction models constructed for each potential observed pattern of covariate. Pragmatic model performance is generally lower than ideal model performance. We focus on model performance within the derivation data, but describe how to extend all the methods to a validation dataset.
引用
收藏
页码:614 / 632
页数:19
相关论文
共 50 条
  • [21] Tobit analysis to investigate determinants of the level of assets in couples' pension accounts using multiply-imputed data and techniques
    Yuh, Y
    DeVaney, SA
    Hanna, S
    CONSUMER INTERESTS ANNUAL, VOL 43: 43RD ANNUAL CONFERENCE OF THE AMERICAN COUNCIL ON CONSUMER INTERESTS, 1997, : 169 - 169
  • [22] SPSS Syntax for Combining Results of Principal Component Analysis of Multiply Imputed Data Sets using Generalized Procrustes Analysis
    van Wingerde, Bart
    van Ginkel, Joost
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2021, 45 (03) : 231 - 232
  • [23] Testing the use of a large language model (LLM) for performing data quality assessment
    Macmaster, Steven
    Sinistore, Julie
    INTERNATIONAL JOURNAL OF LIFE CYCLE ASSESSMENT, 2024,
  • [24] The handling of missing data in trial-based economic evaluations: should data be multiply imputed prior to longitudinal linear mixed-model analyses?
    Ben, Angela Jornada
    van Dongen, Johanna M.
    El Alili, Mohamed
    Heymans, Martijn W.
    Twisk, Jos W. R.
    MacNeil-Vroomen, Janet L.
    de Wit, Maartje
    van Dijk, Susan E. M.
    Oosterhuis, Teddy
    Bosmans, Judith E.
    EUROPEAN JOURNAL OF HEALTH ECONOMICS, 2023, 24 (06): : 951 - 965
  • [25] The handling of missing data in trial-based economic evaluations: should data be multiply imputed prior to longitudinal linear mixed-model analyses?
    Ângela Jornada Ben
    Johanna M. van Dongen
    Mohamed El Alili
    Martijn W. Heymans
    Jos W. R. Twisk
    Janet L. MacNeil-Vroomen
    Maartje de Wit
    Susan E. M. van Dijk
    Teddy Oosterhuis
    Judith E. Bosmans
    The European Journal of Health Economics, 2023, 24 : 951 - 965
  • [26] Consequence assessment and improvement of model predictions by environmental measurement data
    Muller, H
    Bleher, M
    Jacob, P
    Luczak-Urlik, D
    NUCLEAR EMERGENCY DATA MANAGEMENT, 1998, : 91 - 101
  • [27] Assessment of probability distribution of large samples of geotechnical parameters by using normal information spread estimation method
    Zhu Huan-zhen
    Li Xi-bing
    Gong Feng-qiang
    ROCK AND SOIL MECHANICS, 2015, 36 (11) : 3275 - 3282
  • [28] Comparison of mathematical model predictions to experimental data of fatigue and performance
    Van Dongen, HPA
    AVIATION SPACE AND ENVIRONMENTAL MEDICINE, 2004, 75 (03): : A15 - A36
  • [29] Performance assessment of a bifacial PV system using a new energy estimation model
    Sahu, Preeti Kumari
    Roy, J. N.
    Chakraborty, C.
    SOLAR ENERGY, 2023, 262
  • [30] COMPARISON OF LARGE-SCALE BOILER DATA WITH COMBUSTION MODEL PREDICTIONS
    BOYD, RK
    KENT, JH
    ENERGY & FUELS, 1994, 8 (01) : 124 - 130