Incomplete clustering analysis via multiple imputation

被引:2
|
作者
Lee, Jung Wun [1 ]
Harel, Ofer [1 ]
机构
[1] Univ Connecticut, Dept Stat, 215 Glenbrook Rd Unit 4120, Storrs, CT 06269 USA
基金
美国国家科学基金会;
关键词
Incomplete data; model-based clustering; cluster analysis; multiple imputation; missing data; NUMBER;
D O I
10.1080/02664763.2022.2060952
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering analysis is a prevalent statistical method which divides populations into several subgroups of similar units. However, most existing clustering methods require complete data. One general method that addresses incomplete data is multiple imputation (MI) which avoids many limitations found in other single imputation-based methods and complete case analyses. Nevertheless, adopting MI framework to clustering analysis can be challenging since each imputed data might consist of a different number of clusters and there is not a unique parameter for clustering analysis. In response to this problem, we have developed MICA: Multiply Imputed Cluster Analysis. MICA is a framework for clustering incomplete data consisting of two clustering stages. We assess the properties of MICA and its superiority over other existing incomplete clustering strategies based on a simulation study under various data structures. In addition, we demonstrate the usage of MICA by applying it to the Youth Risk Behavior Surveillance System (YRBSS) 2019 data.
引用
收藏
页码:1962 / 1979
页数:18
相关论文
共 50 条
  • [41] Analysis of an incomplete longitudinal composite variable using a marginalized random effects model and multiple imputation
    Gosho, Masahiko
    Maruo, Kazushi
    Ishii, Ryota
    Hirakawa, Akihiro
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (07) : 2200 - 2215
  • [42] Practical considerations for sensitivity analysis after multiple imputation applied to epidemiological studies with incomplete data
    Heraud-Bousquet, Vanina
    Larsen, Christine
    Carpenter, James
    Desenclos, Jean-Claude
    Le Strat, Yann
    BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12
  • [43] Analysis of incomplete quality of life data in advanced stage cancer: A practical application of multiple imputation
    Satoshi Morita
    Kunihiko Kobayashi
    Kenji Eguchi
    Taketoshi Matsumoto
    Masahiko Shibuya
    Yasufumi Yamaji
    Yasuo Ohashi
    Quality of Life Research, 2005, 14 : 1533 - 1544
  • [44] Cox regression analysis with missing covariates via nonparametric multiple imputation
    Hsu, Chiu-Hsieh
    Yu, Mandi
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (06) : 1676 - 1688
  • [45] Multiple- vs Non- or Single-Imputation based Fuzzy Clustering for Incomplete Longitudinal Behavioral Intervention Data
    Zhang, Zhaoyang
    Fang, Hua
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES (CHASE), 2016, : 219 - 228
  • [46] JointAI: Joint Analysis and Imputation of Incomplete Data in R
    Erler, Nicole S.
    Rizopoulos, Dimitris
    Lesaffre, Emmanuel M. E. H.
    JOURNAL OF STATISTICAL SOFTWARE, 2021, 100 (20):
  • [47] Imputation techniques for incomplete data in quadratic discriminant analysis
    Ounpraseuth, Songthip
    Moore, Page C.
    Young, Dean M.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2012, 82 (06) : 863 - 877
  • [48] Multivariable data imputation for the analysis of incomplete credit data
    Lan, Qiujun
    Xu, Xuqing
    Ma, Haojie
    Li, Gang
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 141 (141)
  • [49] Incomplete Multiple Kernel Alignment Maximization for Clustering
    Liu, Xinwang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1412 - 1424
  • [50] Incomplete Multiview Clustering via Late Fusion
    Ye, Yongkai
    Liu, Xinwang
    Liu, Qiang
    Guo, Xifeng
    Yin, Jianping
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018