A new linear approximate clustering algorithm based upon sampling with probability distributing

被引:0
|
作者
Yuan, CA [1 ]
Tang, CJ [1 ]
Li, C [1 ]
Hu, JJ [1 ]
Peng, J [1 ]
机构
[1] Sichuan Univ, Coll Comp, Chengdu 610064, Sichuan, Peoples R China
关键词
k-median algorithm; clustering; probability distributing; Hash function; sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important research direction in knowledge discovery. As the classical method in Clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (Linear Clustering Based on Probability Distributing) is proposed in this paper. The main contribution includes:(l) Partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3)Proves that the complexity of the new algorithm is O(n); (4) By the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.
引用
收藏
页码:1518 / 1523
页数:6
相关论文
共 50 条
  • [1] A Nystrom spectral clustering algorithm based on probability incremental sampling
    Jia, Hongjie
    Ding, Shifei
    Du, Mingjing
    SOFT COMPUTING, 2017, 21 (19) : 5815 - 5827
  • [2] A Nyström spectral clustering algorithm based on probability incremental sampling
    Hongjie Jia
    Shifei Ding
    Mingjing Du
    Soft Computing, 2017, 21 : 5815 - 5827
  • [3] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
    Panthadeep BHATTACHARJEE
    Pinaki MITRA
    Frontiers of Computer Science, 2021, (02) : 36 - 38
  • [4] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
    Panthadeep Bhattacharjee
    Pinaki Mitra
    Frontiers of Computer Science, 2021, 15
  • [5] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
    Bhattacharjee, Panthadeep
    Mitra, Pinaki
    FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
  • [6] A NEW DENSITY BASED SAMPLING TO ENHANCE DBSCAN CLUSTERING ALGORITHM
    Al-mamory, Safaa O.
    Kamil, Israa S.
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2019, 32 (04) : 315 - 327
  • [7] DENDIS: A new density-based sampling for clustering algorithm
    Ros, Frederic
    Guillaume, Serge
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 56 : 349 - 359
  • [8] A linear time algorithm for approximate 2-means clustering
    Sabharwal, Y
    Sen, S
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2005, 32 (02): : 159 - 172
  • [9] HC_AB: A new heuristic clustering algorithm based on Approximate Backbone
    Zong, Yu
    Xu, Guandong
    Jin, Ping
    Zhang, Yanchun
    Chen, Enhong
    INFORMATION PROCESSING LETTERS, 2011, 111 (17) : 857 - 863
  • [10] Sampling based approximate spectral clustering ensemble for partitioning datasets
    Moazzen, Yaser
    Tasdemir, Kadim
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1630 - 1635