An Improved Affinity Propagation Clustering Algorithm for Large-scale Data Sets

被引:0
|
作者
Liu, Xiaonan [1 ]
Yin, Meijuan [1 ]
Luo, Junyong [1 ]
Chen, Wuping [2 ]
机构
[1] State Key Lab Math Engn & Adv Comp, Zhengzhou, Peoples R China
[2] Sci & Technol Informat Assurance Lab, Beijing, Peoples R China
关键词
Data clustering; affinity propagation; hierarchical; selection; clustering center;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Affinity Propagation (AP) clustering does not need to set the number of clusters, and has advantages on efficiency and accuracy, but is not suitable for large-scale data clustering. To ensure both a low time complexity and a good accuracy for the clustering method of affinity propagation on large-scale data clustering, an improved AP clustering algorithm named hierarchical affinity propagation (HAP) is proposed, which clusters data points by using AP algorithm several times on different level data. The data set to be clustered is firstly divided into several subsets, each of which can be efficiently clustered by AP algorithm. Then, the AP algorithm is performed on each subset to respectively select cluster centers of each subset. Further, AP clustering was again implemented on all the local cluster centers to select well-suited global exemplars of whole data set. Finally, to efficiently and accurately cluster data points in a large-scale, all the data points are clustered by the similarities between each data point and the global exemplars. The experimental results on real and simulated data sets show that, compared with the traditional AP and adaptive AP algorithm, the HAP algorithm can greatly reduce the clustering time consumption with a relatively better clustering results.
引用
收藏
页码:894 / 899
页数:6
相关论文
共 50 条
  • [31] Large-Scale Data Clustering Algorithm Based on Quantum Immune Regulation Network
    Li, Yangyang
    Bai, Xiaoyu
    Hou, Xiaoju
    Jiao, Licheng
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [32] A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data
    Ding, Shifei
    Li, Chao
    Xu, Xiao
    Ding, Ling
    Zhang, Jian
    Guo, Lili
    Shi, Tianhao
    PATTERN RECOGNITION, 2023, 136
  • [33] PurTreeClust: A Purchase Tree Clustering Algorithm for Large-scale Customer Transaction Data
    Chen, Xiaojun
    Huang, Joshua Zhexue
    Luo, Jun
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 661 - 672
  • [34] Improved Semi-supervised Clustering Algorithm Based on Affinity Propagation
    金冉
    刘瑞娟
    李晔锋
    寇春海
    Journal of Donghua University(English Edition), 2015, 32 (01) : 125 - 131
  • [35] A genetic algorithm for clustering on very large data sets
    Gasvoda, J
    Ding, Q
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2003, : 163 - 167
  • [36] A Genetic Algorithm Approach for Clustering Large Data Sets
    Luchi, Diego
    Rodrigues, Alexandre
    Varejao, Flavio Miguel
    Santos, Willian
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 570 - 576
  • [37] A study of large-scale data clustering based on fuzzy clustering
    Li, Yangyang
    Yang, Guoli
    He, Haiyang
    Jiao, Licheng
    Shang, Ronghua
    SOFT COMPUTING, 2016, 20 (08) : 3231 - 3242
  • [38] A study of large-scale data clustering based on fuzzy clustering
    Yangyang Li
    Guoli Yang
    Haiyang He
    Licheng Jiao
    Ronghua Shang
    Soft Computing, 2016, 20 : 3231 - 3242
  • [39] An optimizing clustering algorithm for large-scale mobile network
    Tian, YC
    Guoi, W
    Ren, QC
    2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 155 - 159
  • [40] Algorithm for large-scale clustering across multiple genomes
    Yi, Gangman
    Jung, Jaehee
    BIOINFORMATION, 2011, 7 (05) : 251 - 255