Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable

被引:7
|
作者
Javadi, Sara [1 ]
Bahrampour, Abbas [2 ]
Saber, Mohammad Mehdi [3 ]
Garrusi, Behshid [4 ]
Baneshi, Mohammad Reza [2 ]
机构
[1] Kerman Univ Med Sci, Sch Publ Hlth, Dept Biostat & Epidemiol, Kerman, Iran
[2] Kerman Univ Med Sci, Inst Futures Studies Hlth, Modeling Hlth Res Ctr, Kerman, Iran
[3] Higher Educ Ctr Eghlid, Dept Stat, Eghlid, Iran
[4] Kerman Univ Med Sci, Inst Neuropharmacol, Kerman Neurosci Res Ctr, Kerman, Iran
关键词
CLASSIFICATION; INFERENCES; TREES; MICE;
D O I
10.1155/2021/6668822
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 x 5 x 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.
引用
收藏
页数:14
相关论文
共 31 条
  • [21] Binary variable multiple-model multiple imputation to address missing data mechanism uncertainty: application to a smoking cessation trial
    Siddique, Juned
    Harel, Ofer
    Crespi, Catherine M.
    Hedeker, Donald
    STATISTICS IN MEDICINE, 2014, 33 (17) : 3013 - 3028
  • [22] Evaluation of Multiple Imputation Methods for Missing Diary Data for Statistical Analysis in Dry Eye Studies
    Slade, Lot
    Bateman, Kirk
    Usner, Dale W.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2018, 59 (09)
  • [23] Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: A practical guide
    Cro, Suzie
    Morris, Tim P.
    Kenward, Michael G.
    Carpenter, James R.
    STATISTICS IN MEDICINE, 2020, 39 (21) : 2815 - 2842
  • [24] A Comparison of Three Popular Methods for Handling Missing Data: Complete-Case Analysis, Inverse Probability Weighting, and Multiple Imputation
    Little, Roderick J.
    Carpenter, James R.
    Lee, Katherine J.
    SOCIOLOGICAL METHODS & RESEARCH, 2024, 53 (03) : 1105 - 1135
  • [25] A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
    Anurika Priyanjali De Silva
    Margarita Moreno-Betancur
    Alysha Madhu De Livera
    Katherine Jane Lee
    Julie Anne Simpson
    BMC Medical Research Methodology, 17
  • [26] A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study
    De Silva, Anurika Priyanjali
    Moreno-Betancur, Margarita
    De Livera, Alysha Madhu
    Lee, Katherine Jane
    Simpson, Julie Anne
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [27] Multiple Imputation Methods for Handling Missing Data in Cost-effectiveness Analyses That Use Data from Hierarchical Studies: An Application to Cluster Randomized Trials
    Gomes, Manuel
    Diaz-Ordaz, Karla
    Grieve, Richard
    Kenward, Michael G.
    MEDICAL DECISION MAKING, 2013, 33 (08) : 1051 - 1063
  • [28] A systematic survey of the methods literature on the reporting quality and optimal methods of handling participants with missing outcome data for continuous outcomes in randomized controlled trials
    Zhang, Yuqing
    Alyass, Akram
    Vanniyasingam, Thuva
    Sadeghirad, Behnam
    Florez, Ivan D.
    Pichika, Sathish Chandra
    Kennedy, Sean Alexander
    Abdulkarimova, Ulviya
    Zhang, Yuan
    Iljon, Tzvia
    Morgano, Gian Paolo
    Colunga Lozano, Luis E.
    Aloweni, Fazila Abu Bakar
    Lopes, Luciane C.
    Jose Yepes-Nunez, Juan
    Fei, Yutong
    Wang, Li
    Kahale, Lara A.
    Meyre, David
    Akl, Elie A.
    Thabane, Lehana
    Guyatt, Gordon H.
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2017, 88 : 67 - 80
  • [29] Evaluating the Performance of Multiple Imputation Methods for Handling Missing Values in Time Series Data: A Study Focused on East Africa, Soil-Carbonate-Stable Isotope Data
    Hassani, Hossein
    Kalantari, Mahdi
    Ghodsi, Zara
    STATS, 2019, 2 (04): : 457 - 467
  • [30] Methods to Analyze Treatment Effects in the Presence of Missing Data for a Continuous Heavy Drinking Outcome Measure When Participants Drop Out from Treatment in Alcohol Clinical Trials
    Witkiewitz, Katie
    Falk, Daniel E.
    Kranzler, Henry R.
    Litten, Raye Z.
    Hallgren, Kevin A.
    O'Malley, Stephanie S.
    Anton, Raymond F.
    ALCOHOLISM-CLINICAL AND EXPERIMENTAL RESEARCH, 2014, 38 (11) : 2826 - 2834