Accounting for missing data in statistical analyses: multiple imputation is not always the answer

被引：416

作者：

Hughes, Rachael A. ^{[1
,2
]}

Heron, Jon ^{[1
,2
,3
]}

Sterne, Jonathan A. C. ^{[1
,3
]}

Tilling, Kate ^{[1
,2
,3
]}

机构：

[1] Univ Bristol, Populat Hlth Sci, Bristol Med Sch, Bristol, Avon, England

[2] Univ Bristol, MRC Integrat Epidemiol Unit, Bristol, Avon, England

[3] Univ Bristol, NIHR Bristol Biomed Res Ctr, Bristol, Avon, England

来源：

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY | 2019年 / 48卷 / 04期

基金：

英国医学研究理事会;

关键词：

Complete case analysis; inverse probability weighting; missing data; missing data mechanisms; missing data patterns; multiple imputation; SENSITIVITY-ANALYSIS; CHAINED EQUATIONS; CAUSAL DIAGRAMS; INCOMPLETE DATA; HOT DECK; BIAS; DIAGNOSTICS; EFFICIENCY; SURVIVAL; GROWTH;

D O I：

10.1093/ije/dyz032

中图分类号：

R1 [预防医学、卫生学];

学科分类号：

1004 ; 120402 ;

摘要：

Background Missing data are unavoidable in epidemiological research, potentially leading to bias and loss of precision. Multiple imputation (MI) is widely advocated as an improvement over complete case analysis (CCA). However, contrary to widespread belief, CCA is preferable to MI in some situations. Methods We provide guidance on choice of analysis when data are incomplete. Using causal diagrams to depict missingness mechanisms, we describe when CCA will not be biased by missing data and compare MI and CCA, with respect to bias and efficiency, in a range of missing data situations. We illustrate selection of an appropriate method in practice. Results For most regression models, CCA gives unbiased results when the chance of being a complete case does not depend on the outcome after taking the covariates into consideration, which includes situations where data are missing not at random. Consequently, there are situations in which CCA analyses are unbiased while MI analyses, assuming missing at random (MAR), are biased. By contrast MI, unlike CCA, is valid for all MAR situations and has the potential to use information contained in the incomplete cases and auxiliary variables to reduce bias and/or improve precision. For this reason, MI was preferred over CCA in our real data example. Conclusions Choice of method for dealing with missing data is crucial for validity of conclusions, and should be based on careful consideration of the reasons for the missing data, missing data patterns and the availability of auxiliary information.

引用

页码：1294 / 1304

页数：11

共 50 条

[31] Missing Data in Clinical Research: A Tutorial on Multiple Imputation
Austin, Peter C.
White, Ian R.
Lee, Douglas S.
van Buuren, Stef
CANADIAN JOURNAL OF CARDIOLOGY, 2021, 37 (09) : 1322 - 1331
[32] A nonparametric multiple imputation approach for missing categorical data
Zhou, Muhan
He, Yulei
Yu, Mandi
Hsu, Chiu-Hsieh
BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
[33] Multiple Imputation of Missing Composite Outcomes in Longitudinal Data
O’Keeffe A.G.
Farewell D.M.
Tom B.D.M.
Farewell V.T.
Statistics in Biosciences, 2016, 8 (2) : 310 - 332
[34] Handling missing data in trees: Surrogate splits or statistical imputation?
Feelders, A
PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 329 - 334
[35] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
Karangwa, Innocent
Kotze, Danelle
Blignaut, Renette
BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
[36] Multiple Imputation A Flexible Tool for Handling Missing Data
Li, Peng
Stuart, Elizabeth A.
Allison, David B.
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2015, 314 (18): : 1966 - 1967
[37] Multiple imputation: a mature approach to dealing with missing data
S. Chevret
S. Seaman
M. Resche-Rigon
Intensive Care Medicine, 2015, 41 : 348 - 350
[38] Imputation Methods for Multiple Regression with Missing Heteroscedastic Data
Asif, Muhammad
Samart, Klairung
THAILAND STATISTICIAN, 2022, 20 (01): : 1 - 15
[39] Multiple imputation of missing genotype data for unrelated individuals
Souverein, OW
Zwinderman, AH
Tanck, MWT
ANNALS OF HUMAN GENETICS, 2006, 70 : 372 - 381
[40] Missing data and multiple imputation in clinical epidemiological research
Pedersen, Alma B.
Mikkelsen, Ellen M.
Cronin-Fenton, Deirdre
Kristensen, Nickolaj R.
Tra My Pham
Pedersen, Lars
Petersen, Irene
CLINICAL EPIDEMIOLOGY, 2017, 9 : 157 - 165

← 1 2 3 4 5 →