Incomplete clustering analysis via multiple imputation

被引:2
|
作者
Lee, Jung Wun [1 ]
Harel, Ofer [1 ]
机构
[1] Univ Connecticut, Dept Stat, 215 Glenbrook Rd Unit 4120, Storrs, CT 06269 USA
基金
美国国家科学基金会;
关键词
Incomplete data; model-based clustering; cluster analysis; multiple imputation; missing data; NUMBER;
D O I
10.1080/02664763.2022.2060952
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering analysis is a prevalent statistical method which divides populations into several subgroups of similar units. However, most existing clustering methods require complete data. One general method that addresses incomplete data is multiple imputation (MI) which avoids many limitations found in other single imputation-based methods and complete case analyses. Nevertheless, adopting MI framework to clustering analysis can be challenging since each imputed data might consist of a different number of clusters and there is not a unique parameter for clustering analysis. In response to this problem, we have developed MICA: Multiply Imputed Cluster Analysis. MICA is a framework for clustering incomplete data consisting of two clustering stages. We assess the properties of MICA and its superiority over other existing incomplete clustering strategies based on a simulation study under various data structures. In addition, we demonstrate the usage of MICA by applying it to the Youth Risk Behavior Surveillance System (YRBSS) 2019 data.
引用
收藏
页码:1962 / 1979
页数:18
相关论文
共 50 条
  • [31] Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field
    Tian, Jing
    Yu, Bing
    Yu, Dan
    Ma, Shilong
    SCIENTIFIC WORLD JOURNAL, 2013,
  • [32] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Bu, Fanyu
    Chen, Zhikui
    Zhang, Qingchen
    Yang, Laurence T.
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (08): : 2977 - 2990
  • [33] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [34] Usefulness of imputation for the analysis of incomplete otoneurologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    Lammi, S
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 235 - 242
  • [35] Multiple Imputation by Generative Adversarial Networks for Classification with Incomplete Data
    Bao Ngoc Vi
    Dinh Tan Nguyen
    Cao Truong Tran
    Huu Phuc Ngo
    Chi Cong Nguyen
    Hai-Hong Phan
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 162 - 167
  • [36] A comparison of multiple imputation methods for incomplete longitudinal binary data
    Yamaguchi, Yusuke
    Misumi, Toshihiro
    Maruo, Kazushi
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (04) : 645 - 667
  • [37] Multiple imputation of incomplete zero-inflated count data
    Kleinke, Kristian
    Reinecke, Jost
    STATISTICA NEERLANDICA, 2013, 67 (03) : 311 - 336
  • [38] Fuzzy Clustering and Nonlinear Regression Imputation for Incomplete Data of Tunnel Boring Machine
    Wang Y.
    Pang Y.
    Zhang L.
    Shi Y.
    Sun W.
    Song X.
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2023, 59 (12): : 28 - 37
  • [39] Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data
    Vanina Héraud-Bousquet
    Christine Larsen
    James Carpenter
    Jean-Claude Desenclos
    Yann Le Strat
    BMC Medical Research Methodology, 12
  • [40] Analysis of incomplete quality of life data in advanced stage cancer: A practical application of multiple imputation
    Morita, S
    Kobayashi, K
    Eguchi, K
    Matsumoto, T
    Shibuya, M
    Yamaji, Y
    Ohashi, Y
    QUALITY OF LIFE RESEARCH, 2005, 14 (06) : 1533 - 1544