Accounting for missing data in statistical analyses: multiple imputation is not always the answer

被引:416
|
作者
Hughes, Rachael A. [1 ,2 ]
Heron, Jon [1 ,2 ,3 ]
Sterne, Jonathan A. C. [1 ,3 ]
Tilling, Kate [1 ,2 ,3 ]
机构
[1] Univ Bristol, Populat Hlth Sci, Bristol Med Sch, Bristol, Avon, England
[2] Univ Bristol, MRC Integrat Epidemiol Unit, Bristol, Avon, England
[3] Univ Bristol, NIHR Bristol Biomed Res Ctr, Bristol, Avon, England
基金
英国医学研究理事会;
关键词
Complete case analysis; inverse probability weighting; missing data; missing data mechanisms; missing data patterns; multiple imputation; SENSITIVITY-ANALYSIS; CHAINED EQUATIONS; CAUSAL DIAGRAMS; INCOMPLETE DATA; HOT DECK; BIAS; DIAGNOSTICS; EFFICIENCY; SURVIVAL; GROWTH;
D O I
10.1093/ije/dyz032
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. Methods We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. Results For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. Conclusions Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.
引用
收藏
页码:1294 / 1304
页数:11
相关论文
共 50 条
  • [1] Missing Data and Multiple Imputation
    Cummings, Peter
    JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [2] Multiple imputation for missing data
    Patrician, PA
    RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84
  • [3] Multiple imputation of missing data
    Lydersen, Stian
    TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151
  • [4] A comparison of multiple imputation and doubly robust estimation for analyses with missing data
    Carpenter, James R.
    Kenward, Michael G.
    Vansteelandt, Stijn
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2006, 169 : 571 - 584
  • [5] Analyses using multiple imputation need to consider missing data in auxiliary variables
    Madley-Dowd, Paul
    Curnow, Elinor
    Hughes, Rachael A.
    Cornish, Rosie P.
    Tilling, Kate
    Heron, Jon
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2025,
  • [6] Multiple Imputation For Missing Ordinal Data
    Chen, Ling
    Toma-Drane, Mariana
    Valois, Robert F.
    Drane, J. Wanzer
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2005, 4 (01) : 288 - 299
  • [7] MULTIPLE IMPUTATION AS A MISSING DATA MACHINE
    BRAND, J
    VANBUUREN, S
    VANMULLIGEN, EM
    TIMMERS, T
    GELSEMA, E
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, : 303 - 306
  • [8] Multiple imputation with missing data indicators
    Beesley, Lauren J.
    Bondarenko, Irina
    Elliot, Michael R.
    Kurian, Allison W.
    Katz, Steven J.
    Taylor, Jeremy M. G.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (12) : 2685 - 2700
  • [9] Multiple imputation: dealing with missing data
    de Goeij, Moniek C. M.
    van Diepen, Merel
    Jager, Kitty J.
    Tripepi, Giovanni
    Zoccali, Carmine
    Dekker, Friedo W.
    NEPHROLOGY DIALYSIS TRANSPLANTATION, 2013, 28 (10) : 2415 - 2420
  • [10] Multiple imputation for nonignorable missing data
    Im, Jongho
    Kim, Soeun
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2017, 46 (04) : 583 - 592