Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable

被引：7

作者：

Javadi, Sara ^{[1
]}

Bahrampour, Abbas ^{[2
]}

Saber, Mohammad Mehdi ^{[3
]}

Garrusi, Behshid ^{[4
]}

Baneshi, Mohammad Reza ^{[2
]}

机构：

[1] Kerman Univ Med Sci, Sch Publ Hlth, Dept Biostat & Epidemiol, Kerman, Iran

[2] Kerman Univ Med Sci, Inst Futures Studies Hlth, Modeling Hlth Res Ctr, Kerman, Iran

[3] Higher Educ Ctr Eghlid, Dept Stat, Eghlid, Iran

[4] Kerman Univ Med Sci, Inst Neuropharmacol, Kerman Neurosci Res Ctr, Kerman, Iran

来源：

JOURNAL OF PROBABILITY AND STATISTICS | 2021年 / 2021卷

关键词：

CLASSIFICATION; INFERENCES; TREES; MICE;

D O I：

10.1155/2021/6668822

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 x 5 x 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.

引用

页数：14

共 31 条

[1] Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome
Melissa Middleton
Cattram Nguyen
Margarita Moreno-Betancur
John B. Carlin
Katherine J. Lee
BMC Medical Research Methodology, 22
[2] Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome
Middleton, Melissa
Nguyen, Cattram
Moreno-Betancur, Margarita
Carlin, John B.
Lee, Katherine J.
BMC MEDICAL RESEARCH METHODOLOGY, 2022, 22 (01)
[3] Multiple imputation in the presence of an incomplete binary variable created from an underlying continuous variable
Grobler, Anneke C.
Lee, Katherine
BIOMETRICAL JOURNAL, 2020, 62 (02) : 467 - 478
[4] Multiple imputation for handling missing outcome data when estimating the relative risk
Thomas R. Sullivan
Katherine J. Lee
Philip Ryan
Amy B. Salter
BMC Medical Research Methodology, 17
[5] Multiple imputation for handling missing outcome data when estimating the relative risk
Sullivan, Thomas R.
Lee, Katherine J.
Ryan, Philip
Salter, Amy B.
BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
[6] METHODS FOR THE ANALYSIS OF BINARY OUTCOME RESULTS IN THE PRESENCE OF MISSING DATA
DELUCCHI, KL
JOURNAL OF CONSULTING AND CLINICAL PSYCHOLOGY, 1994, 62 (03) : 569 - 575
[7] An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm
Shin, Tacksoo
Long, Jeffrey D.
Davison, Mark L.
JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2022, 5 (02) : 629 - 659
[8] An evaluation of methods to handle missing data in the context of latent variable interaction analysis: multiple imputation, maximum likelihood, and random forest algorithm
Tacksoo Shin
Jeffrey D. Long
Mark L. Davison
Japanese Journal of Statistics and Data Science, 2022, 5 : 629 - 659
[9] Multiple imputation for handling missing outcome data in randomized trials involving a mixture of independent and paired data
Sullivan, Thomas R.
Yelland, Lisa N.
Moreno-Betancur, Margarita
Lee, Katherine J.
STATISTICS IN MEDICINE, 2021, 40 (27) : 6008 - 6020
[10] A comparison of multiple-imputation methods for handling missing data in repeated measurements observational studies
Kalaycioglu, Oya
Copas, Andrew
King, Michael
Omar, Rumana Z.
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2016, 179 (03) : 683 - 706

← 1 2 3 4 →