An Improved Mean Imputation Clustering Algorithm for Incomplete Data

被引:0
|
作者
Hong Shi
Pingxin Wang
Xin Yang
Hualong Yu
机构
[1] Jiangsu University of Science and Technology,School of Computer Science
[2] Jiangsu University of Science and Technology,School of Science
[3] Hebei Normal University,College of Mathematics and Information Science
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Incomplete data; Mean imputation; K-means; Validity index;
D O I
暂无
中图分类号
学科分类号
摘要
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute’s value of each cluster to fill the missing attribute’s value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.
引用
收藏
页码:3537 / 3550
页数:13
相关论文
共 50 条
  • [41] Dual imputation model for incomplete longitudinal data
    Jolani, Shahab
    Frank, Laurence E.
    van Buuren, Stef
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2014, 67 (02): : 197 - 212
  • [42] Multiple imputation for incomplete data with semicontinuous variables
    Javaras, KN
    Van Dyk, DA
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 703 - 715
  • [43] An imputation strategy for incomplete longitudinal ordinal data
    Demirtas, Hakan
    Hedeker, Donald
    STATISTICS IN MEDICINE, 2008, 27 (20) : 4086 - 4093
  • [44] A new imputation method for incomplete binary data
    Subasi, Munevver Mine
    Subasi, Ersoy
    Anthony, Martin
    Hammer, Peter L.
    DISCRETE APPLIED MATHEMATICS, 2011, 159 (10) : 1040 - 1047
  • [45] Usefulness of imputation for the analysis of incomplete otoneurologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    Lammi, S
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 235 - 242
  • [46] Evolving Clustering Based Data Imputation
    Gautam, Chandan
    Ravi, Vadlamani
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1763 - 1769
  • [47] Clustering Imputation for Air Pollution Data
    Alahamade, Wedad
    Lake, Iain
    Reeves, Claire E.
    De la Iglesia, Beatriz
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 585 - 597
  • [48] Possibility Clustering Algorithm for Incomplete Data Based on a Deep Computing Model
    Li, Dongping
    Yang, Yingchun
    Yue, Qiang
    Cheng, Liqi
    Song, Jie
    Liu, Yuyan
    JOURNAL OF INTERCONNECTION NETWORKS, 2022, 22 (SUPP03)
  • [49] Clustering algorithm for incomplete data sets with mixed numeric and categorical attributes
    Sen, Wu
    Hong, Chen
    Xiaodong, Feng
    International Journal of Database Theory and Application, 2013, 6 (05): : 95 - 104
  • [50] A Robust Fuzzy c-Means Clustering Algorithm for Incomplete Data
    Li, Jinhua
    Song, Shiji
    Zhang, Yuli
    Li, Kang
    INTELLIGENT COMPUTING, NETWORKED CONTROL, AND THEIR ENGINEERING APPLICATIONS, PT II, 2017, 762 : 3 - 12