Internal-external cross-validation helped to evaluate the generalizability of prediction models in large clustered datasets

被引:28
|
作者
Takada, Toshihiko [1 ]
Nijman, Steven [1 ]
Denaxas, Spiros [2 ,3 ,4 ,5 ]
Snell, Kym I. E. [6 ]
Uijl, Alicia [1 ,7 ,8 ]
Nguyen, Tri-Long [1 ,9 ]
Asselbergs, Folkert W. [2 ,10 ]
Debray, Thomas P. A. [1 ,2 ]
机构
[1] Univ Utrecht, Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Univ Weg 100, NL-3584 CG Utrecht, Netherlands
[2] UCL, Hlth Data Res UK & Inst Hlth Informat, Gibbs Bldg,215 Euston Rd, London NW1 2BE, England
[3] Alan Turing Inst, British Lib, 96 Euston Rd, London NW1 2DB, England
[4] UCL, Univ Coll London Hosp, Biomed Res Ctr, Natl Inst Hlth Res, Suite A,1st Floor,Maple House, London W1T 7DN, England
[5] UCL, British Heart Fdn Res Accelerator, Gower St, London WC1E 6BT, England
[6] Keele Univ, Sch Med, Ctr Prognosis Res, Keele ST5 5BG, Staffs, England
[7] Karolinska Inst, Dept Med, Div Cardiol, S-17177 Stockholm, Sweden
[8] Univ Utrecht, Univ Med Ctr Utrecht, Dept Cardiol, Div Heart & Lungs, Heidelberglaan 100,POB 85500, NL-3508 GA Utrecht, Netherlands
[9] Univ Copenhagen, CSS, Dept Publ Hlth, Sect Epidemiol, Oster Farimagsgade 5, DK-1353 Copenhagen K, Denmark
[10] UCL, Inst Cardiovasc Sci, Fac Populat Hlth Sci, Gower St, London WC1E 6BT, England
基金
欧盟地平线“2020”;
关键词
Prediction model; Calibration; Discrimination; Validation; Heterogeneity; Model comparison; INCIDENT HEART-FAILURE; MULTIPLE IMPUTATION; METAANALYSIS; PERFORMANCE; BIOMARKERS; RISK;
D O I
10.1016/j.jclinepi.2021.03.025
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: To illustrate how to evaluate the need of complex strategies for developing generalizable prediction models in large clustered datasets. Study Design and Setting: We developed eight Cox regression models to estimate the risk of heart failure using a large population level dataset. These models differed in the number of predictors, the functional form of the predictor effects (non-linear effects and interaction) and the estimation method (maximum likelihood and penalization). Internal-external cross-validation was used to evaluate the models' generalizability across the included general practices. Results: Among 871,687 individuals from 225 general practices, 43,987 (5.5%) developed heart failure during a median follow-up time of 5.8 years. For discrimination, the simplest prediction model yielded a good concordance statistic, which was not much improved by adopting complex strategies. Between-practice heterogeneity in discrimination was similar in all models. For calibration, the simplest model performed satisfactorily. Although accounting for non-linear effects and interaction slightly improved the calibration slope, it also led to more heterogeneity in the observed/expected ratio. Similar results were found in a second case study involving patients with stroke. Conclusion: In large clustered datasets, prediction model studies may adopt internal-external cross-validation to evaluate the generalizability of competing models, and to identify promising modelling strategies. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http:// creativecommons.org/ licenses/ by/ 4.0/ )
引用
收藏
页码:83 / 91
页数:9
相关论文
共 50 条
  • [41] Internal and External Validation of Machine Learning Models for Predicting Acute Kidney Injury Following Non-Cardiac Surgery Using Open Datasets
    Lee, Sang-Wook
    Jang, Jaewon
    Seo, Woo-Young
    Lee, Donghee
    Kim, Sung-Hoon
    JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (06):
  • [42] Application of cross-validation strategies to avoid overestimation of performance of 2D-QSAR models for the prediction of aquatic toxicity of chemical mixtures
    Chatterjee, M.
    Roy, K.
    SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2022, 33 (06) : 463 - 484
  • [43] Using UAV-Based Multispectral Imagery, Data-Driven Models, and Spatial Cross-Validation for Corn Grain Yield Prediction
    Killeen, Patrick
    Kiringa, Iluju
    Yeap, Tet
    Branco, Paula
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 823 - 831
  • [44] INTERNAL, EXTERNAL, AND CROSS-VALIDATION OF THE DEDUCE MODEL, A COST-UTILITY TOOL USING PATIENT-LEVEL MICROSIMULATION TO EVALUATE SENSOR-BASED GLUCOSE MONITORING SYSTEMS IN TYPE 1 AND TYPE 2 DIABETES
    Coaquira, Castro J.
    De Pouvourville, G.
    Greenberg, D.
    Harris, S.
    Jendle, J.
    Shaw, J. E.
    Levrat, Guillen F.
    Szafranski, K.
    VALUE IN HEALTH, 2022, 25 (12) : S11 - S11
  • [45] Disentangling data dependency using cross-validation strategies to evaluate prediction quality of cattle grazing activities using machine learning algorithms and wearable sensor data
    Ribeiro, Leonardo Augusto Coelho
    Bresolin, Tiago
    Rosa, Guilherme Jordao de Magalhaes
    Casagrande, Daniel Rume
    Danes, Marina de Arruda Camargo
    Dorea, Joao Ricardo Reboucas
    JOURNAL OF ANIMAL SCIENCE, 2021, 99 (09)
  • [46] Dementia risk in the general population: large-scale external validation of prediction models in the AGES-Reykjavik study
    Jet M. J. Vonk
    Jacoba P. Greving
    Vilmundur Gudnason
    Lenore J. Launer
    Mirjam I. Geerlings
    European Journal of Epidemiology, 2021, 36 : 1025 - 1041
  • [47] EXTERNAL VALIDATION AND COMPARISON OF PREDICTION MODELS FOR TRAUMATIC BRAIN INJURY USING A LARGE SINGLE-CENTER TBI DATABASE
    Keachie, Krista
    Dienes, Erin
    Rudisill, Nancy
    Smith, Karen
    Zwienenberg-Lee, Marike
    Muizelaar, Paul
    Shahlaie, Kiarash
    JOURNAL OF NEUROTRAUMA, 2012, 29 (10) : A93 - A93
  • [48] Dementia risk in the general population: large-scale external validation of prediction models in the AGES-Reykjavik study
    Vonk, Jet M. J.
    Greving, Jacoba P.
    Gudnason, Vilmundur
    Launer, Lenore J.
    Geerlings, Mirjam I.
    EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2021, 36 (10) : 1025 - 1041
  • [49] Total-body skeletal muscle mass: development and cross-validation of anthropometric prediction models. (vol 72, pg 796, 2000)
    Lee, RC
    Wang, Z
    Heo, M
    Ross, R
    Janssen, I
    Heymsfeld, SB
    AMERICAN JOURNAL OF CLINICAL NUTRITION, 2001, 73 (05): : 995 - 995
  • [50] Methodological Issues in Evaluating Machine Learning Models for EEG Seizure Prediction: Good Cross-Validation Accuracy Does Not Guarantee Generalization to New Patients
    Shafiezadeh, Sina
    Duma, Gian Marco
    Mento, Giovanni
    Danieli, Alberto
    Antoniazzi, Lisa
    Cristaldi, Fiorella Del Popolo
    Bonanni, Paolo
    Testolin, Alberto
    APPLIED SCIENCES-BASEL, 2023, 13 (07):