A comparison of model selection methods for prediction in the presence of multiply imputed data

被引:31
|
作者
Le Thi Phuong Thao [1 ]
Geskus, Ronald [1 ,2 ]
机构
[1] Univ Oxford, Biostat Grp, Clin Res Unit, Ho Chi Minh City, Vietnam
[2] Univ Oxford, Nuffield Dept Med, Oxford, England
基金
英国惠康基金;
关键词
lasso; multiply imputed data; prediction; stacked data; variable selection; VARIABLE SELECTION;
D O I
10.1002/bimj.201700232
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. >= 50%) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1-se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1-se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets
引用
收藏
页码:343 / 356
页数:14
相关论文
共 50 条
  • [21] How to Apply Variable Selection Machine Learning Algorithms With Multiply Imputed Data: A Missing Discussion
    Gunn, Heather J.
    Rezvan, Panteha Hayati
    Fernandez, M. Isabel
    Comulada, W. Scott
    PSYCHOLOGICAL METHODS, 2023, 28 (02) : 452 - 471
  • [22] Differential Network Analysis with Multiply Imputed Lipidomic Data
    Kujala, Maiju
    Nevalainen, Jaakko
    Maerz, Winfried
    Laaksonen, Reijo
    Datta, Susmita
    PLOS ONE, 2015, 10 (03):
  • [23] Multiply-Imputed Synthetic Data: Advice to the Imputer
    Loong, Bronwyn
    Rubin, Donald B.
    JOURNAL OF OFFICIAL STATISTICS, 2017, 33 (04) : 1005 - 1019
  • [24] Model Selection to Enhance Prediction Performance in the Presence of Missing Data
    Bashir, Faraj
    Wei, Hua-Liang
    Benomair, Abdollha
    2015 20TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS (MMAR), 2015, : 846 - 850
  • [25] Performance Comparison of Feature Selection Methods for Prediction in Medical Data
    Khalid, Nur Hidayah Mohd
    Ismail, Amelia Ritahani
    Aziz, Normaziah Abdul
    Hussin, Amir Aatieff Amir
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2023, 2023, 1771 : 92 - 106
  • [26] Traffic Flow Prediction With Missing Data Imputed by Tensor Completion Methods
    Li, Qin
    Tan, Huachun
    Wu, Yuankai
    Ye, Linhui
    Ding, Fan
    IEEE ACCESS, 2020, 8 : 63188 - 63201
  • [27] Using mixtures of t densities to make inferences in the presence of missing data with a small number of multiply imputed data sets
    Rashid, S.
    Mitra, R.
    Steele, R. J.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 92 : 84 - 96
  • [28] Obtaining Predictions from Models Fit to Multiply Imputed Data
    Miles, Andrew
    SOCIOLOGICAL METHODS & RESEARCH, 2016, 45 (01) : 175 - 185
  • [29] Multivariate outlier detection applied to multiply imputed laboratory data
    Penny, KI
    Jolliffe, IT
    STATISTICS IN MEDICINE, 1999, 18 (14) : 1879 - 1895
  • [30] Assessing the Fit of Structural Equation Models With Multiply Imputed Data
    Enders, Craig K.
    Mansolf, Maxwell
    PSYCHOLOGICAL METHODS, 2018, 23 (01) : 76 - 93