An Improved Mean Imputation Clustering Algorithm for Incomplete Data

被引:0
|
作者
Hong Shi
Pingxin Wang
Xin Yang
Hualong Yu
机构
[1] Jiangsu University of Science and Technology,School of Computer Science
[2] Jiangsu University of Science and Technology,School of Science
[3] Hebei Normal University,College of Mathematics and Information Science
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Incomplete data; Mean imputation; K-means; Validity index;
D O I
暂无
中图分类号
学科分类号
摘要
There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms can not be used for incomplete data sets directly because objects with missing values need to be preprocessed. For this reason, this paper presents an improved mean imputation clustering algorithm for incomplete data based on partition clustering algorithm. In the proposed method, we divide the universe into two sets: the set of objects with non-missing values and the set of objects with missing values. Firstly, the objects with non-missing values are clustered by traditional clustering algorithm. For each object with missing values, we use the mean attribute’s value of each cluster to fill the missing attribute’s value based on the cluster results of the objects with non-missing values, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. The experimental clustering results on some UCI data sets are evaluated by several validity indexes, which proves the effectiveness of the proposed algorithm.
引用
收藏
页码:3537 / 3550
页数:13
相关论文
共 50 条
  • [31] Local Similarity Imputation Based on Fast Clustering for Incomplete Data in Cyber-Physical Systems
    Zhao, Liang
    Chen, Zhikui
    Yang, Zhennan
    Hu, Yueming
    Obaidat, Mohammad S.
    IEEE SYSTEMS JOURNAL, 2018, 12 (02): : 1610 - 1620
  • [32] An Improved Crow Search Algorithm for Data Clustering
    Wijayaningrum, Vivi Nur
    Putriwijaya, Novi Nur
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2020, 8 (01) : 86 - 101
  • [33] Improved Clustering for Categorical Data with Genetic Algorithm
    Sharma, Abha
    Thakur, R. S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING & COMMUNICATION SYSTEMS, MCCS 2015, 2018, 453 : 67 - 76
  • [34] An improved algorithm for clustering gene expression data
    Bandyopadhyay, Sanghamitra
    Mukhopadhyay, Anirban
    Maulik, Ujjwal
    BIOINFORMATICS, 2007, 23 (21) : 2859 - 2865
  • [35] An improved Data Clustering algorithm in a Multiobjective Framework
    Thakare, Anuradha D.
    More, M. A.
    2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
  • [36] Multivariable data imputation for the analysis of incomplete credit data
    Lan, Qiujun
    Xu, Xuqing
    Ma, Haojie
    Li, Gang
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 141 (141)
  • [37] Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things
    Agbo, Benjamin
    Qin, Yongrui
    Hill, Richard
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 130 - 137
  • [38] A hybrid data clustering algorithm based on improved krill herd algorithm and KHM clustering
    Wang Q.-P.
    Ding C.
    Wang X.-F.
    Kongzhi yu Juece/Control and Decision, 2020, 35 (10): : 2449 - 2458
  • [39] Multiple Imputation for Incomplete Data in Epidemiologic Studies
    Harel, Ofer
    Mitchell, Emily M.
    Perkins, Neil J.
    Cole, Stephen R.
    Tchetgen, Eric J. Tchetgen
    Sun, BaoLuo
    Schisterman, Enrique F.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2018, 187 (03) : 576 - 584
  • [40] A multiple imputation strategy for incomplete longitudinal data
    Landrum, MB
    Becker, MP
    STATISTICS IN MEDICINE, 2001, 20 (17-18) : 2741 - 2760