A Comparative Study of Imputation Methods for Multivariate Ordinal Data

被引:6
|
作者
Wongkamthong, Chayut [1 ]
Akande, Olanrewaju [2 ,3 ]
机构
[1] Duke Univ, Social Sci Res Inst, Gross Hall Bldg,2nd Floor,140 Sci Dr, Durham, NC 27708 USA
[2] Duke Univ, Social Sci Res Inst, Practice, 415 Chapel Dr, Durham, NC 27705 USA
[3] Duke Univ, Dept Stat Sci, 415 Chapel Dr, Durham, NC 27705 USA
关键词
Missing data; Mixtures; Multiple imputation; Nonresponse; Tree methods; BAYESIAN MULTIPLE IMPUTATION; MISSING DATA; MODELS; MICE;
D O I
10.1093/jssam/smab028
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Missing data remains a very common problem in large datasets, including survey and census data containing many ordinal responses, such as political polls and opinion surveys. Multiple imputation (MI) is usually the go-to approach for analyzing such incomplete datasets, and there are indeed several implementations of MI, including methods using generalized linear models, tree-based models, and Bayesian non-parametric models. However, there is limited research on the statistical performance of these methods for multivariate ordinal data. In this article, we perform an empirical evaluation of several MI methods, including MI by chained equations (MICE) using multinomial logistic regression models, MICE using proportional odds logistic regression models, MICE using classification and regression trees, MICE using random forest, MI using Dirichlet process (DP) mixtures of products of multinomial distributions, and MI using DP mixtures of multivariate normal distributions. We evaluate the methods using simulation studies based on ordinal variables selected from the 2018 American Community Survey. Under our simulation settings, the results suggest that MI using proportional odds logistic regression models, classification and regression trees, and DP mixtures of multinomial distributions generally outperform the other methods. In certain settings, MI using multinomial logistic regression models is able to achieve comparable performance, depending on the missing data mechanism and amount of missing data.
引用
收藏
页码:189 / 212
页数:24
相关论文
共 50 条
  • [41] A Comparative Study of Imputation Methods to Predict Missing Attribute Values in Coronary Heart Disease Data Set
    Setiawan, N. A.
    Venkatachalam, P. A.
    Hani, A. F. M.
    4TH KUALA LUMPUR INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING 2008, VOLS 1 AND 2, 2008, 21 (1-2): : 266 - 269
  • [42] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [43] A Benchmark for Data Imputation Methods
    Jaeger, Sebastian
    Allhorn, Arndt
    Biessmann, Felix
    FRONTIERS IN BIG DATA, 2021, 4
  • [44] Imputation Methods for Incomplete Data
    Umathe, Vaishali H.
    Chaudhary, Gauri
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [45] Comparison of five iterative imputation methods for multivariate classification
    Liu, Yushan
    Brown, Steven D.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 120 : 106 - 115
  • [46] Data Imputation for Symbolic Regression with Missing Values: A Comparative Study
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2093 - 2100
  • [47] Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables
    Cao, Yi
    Allore, Heather
    Vander Wyk, Brent
    Gutman, Roee
    STATISTICS IN MEDICINE, 2022, 41 (30) : 5844 - 5876
  • [48] Comparative analysis of traditional methods and a deep learning approach for multivariate imputation of missing values in the meteorological field
    Arias-Munoz, Ana Cristina
    Cob-Garcia, Susana
    Calvo-Valverde, Luis-Alexander
    TECNOLOGIA EN MARCHA, 2024, 37 (03): : 33 - 47
  • [49] DATA DIMENSIONALITY REDUCTION METHODS FOR ORDINAL DATA
    Prokop, Martin
    Rezankova, Hana
    INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2011, : 523 - 533
  • [50] A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables
    Wu, Wei
    Jia, Fan
    Enders, Craig
    MULTIVARIATE BEHAVIORAL RESEARCH, 2015, 50 (05) : 484 - 503