Variable selection for multiply-imputed data with penalized generalized estimating equations

被引:6
|
作者
Geronimi, J. [1 ,2 ]
Saporta, G. [2 ]
机构
[1] IRIS, 50 Rue Carnot, F-92284 Suresnes, France
[2] CNAM, Cedric, 292 Rue St Martin, F-75141 Paris, France
关键词
Generalized estimating equations; LASSO; Longitudinal data; Missing data; Multiple imputation; Variable selection; LONGITUDINAL DATA; MISSING DATA; IMPUTATION; REGRESSION; KNEE;
D O I
10.1016/j.csda.2017.01.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Generalized estimating equations (GEE) are useful tools for marginal regression analysis for longitudinal data. Having a high number of variables along with the presence of missing data presents complex issues when working in a longitudinal context. In variable selection for instance, penalized generalized estimating equations have not been systematically developed to integrate missing data. The MI-PGEE: multiple imputation penalized generalized estimating equations, an extension of the multiple imputation least absolute shrinkage and selection operator (MI-LASSO) is presented. MI-PGEE allows integration of missing data and within-subject correlation in variable selection procedures. Missing data are dealt with using multiple imputation, and variable selection is performed using a group LASSO penalty. Estimated coefficients for the same variable across multiply imputed datasets are considered as a group while applying penalized generalized estimating equations, leading to a unique model across multiply-imputed datasets. In order to select the tuning parameter, a new BIC-like criterion is proposed. In a simulation study, the advantage of using MI-PGEE compared to simple imputation PGEE is shown. The usefulness of the new method is illustrated by an application to a subgroup of the placebo arm of the strontium ranelate efficacy in knee osteoarthritis trial study. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:103 / 114
页数:12
相关论文
共 50 条
  • [1] Variable selection for multiply-imputed data with application to dioxin exposure study
    Chen, Qixuan
    Wang, Sijian
    STATISTICS IN MEDICINE, 2013, 32 (21) : 3646 - 3659
  • [2] Model selection of generalized estimating equations with multiply imputed longitudinal data
    Shen, Chung-Wei
    Chen, Yi-Hau
    BIOMETRICAL JOURNAL, 2013, 55 (06) : 899 - 911
  • [3] Variable selection and prediction of clinical outcome with multiply-imputed data via Bayesian model averaging
    Jiang, Guozhi
    Tam, Claudia H. T.
    Luk, Andrea O. Y.
    Kong, Alice P. S.
    So, Wing Yee
    Chan, Juliana C. N.
    Ma, Ronald C. W.
    Fan, Xiaodan
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 727 - 730
  • [4] Variable Selection with Multiply-Imputed Datasets: Choosing Between Stacked and Grouped Methods
    Du, Jiacong
    Boss, Jonathan
    Han, Peisong
    Beesley, Lauren J.
    Kleinsasser, Michael
    Goutman, Stephen A.
    Batterman, Stuart
    Feldman, Eva L.
    Mukherjee, Bhramar
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2022, 31 (04) : 1063 - 1075
  • [5] Multiply-Imputed Synthetic Data: Advice to the Imputer
    Loong, Bronwyn
    Rubin, Donald B.
    JOURNAL OF OFFICIAL STATISTICS, 2017, 33 (04) : 1005 - 1019
  • [6] Penalized weighted least-squares estimate for variable selection on correlated multiply imputed data
    Li, Yang
    Yang, Haoyu
    Yu, Haochen
    Huang, Hanwen
    Shen, Ye
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2023, 72 (03) : 703 - 717
  • [7] Variable selection via penalized generalized estimating equations for a marginal survival model
    Niu, Yi
    Wang, Xiaoguang
    Cao, Hui
    Peng, Yingwei
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2020, 29 (09) : 2493 - 2506
  • [8] Likelihood-ratio tests with multiply-imputed data sets
    1600, Publ by American Statistical Assoc, Alexandria, VA, USA
  • [9] PERFORMING LIKELIHOOD RATIO TESTS WITH MULTIPLY-IMPUTED DATA SETS
    MENG, XL
    RUBIN, DB
    BIOMETRIKA, 1992, 79 (01) : 103 - 111
  • [10] How should variable selection be performed with multiply imputed data?
    Wood, Angela M.
    White, Ian R.
    Royston, Patrick
    STATISTICS IN MEDICINE, 2008, 27 (17) : 3227 - 3246