A new linear approximate clustering algorithm based upon sampling with probability distributing

被引:0
|
作者
Yuan, CA [1 ]
Tang, CJ [1 ]
Li, C [1 ]
Hu, JJ [1 ]
Peng, J [1 ]
机构
[1] Sichuan Univ, Coll Comp, Chengdu 610064, Sichuan, Peoples R China
关键词
k-median algorithm; clustering; probability distributing; Hash function; sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important research direction in knowledge discovery. As the classical method in Clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (Linear Clustering Based on Probability Distributing) is proposed in this paper. The main contribution includes:(l) Partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3)Proves that the complexity of the new algorithm is O(n); (4) By the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.
引用
收藏
页码:1518 / 1523
页数:6
相关论文
共 50 条
  • [21] Fast approximate minimum spanning tree based clustering algorithm
    Jothi, R.
    Mohanty, Sraban Kumar
    Ojha, Aparajita
    NEUROCOMPUTING, 2018, 272 : 542 - 557
  • [22] Sampling Based Approximate τ-Quantile Computation Algorithm in Sensor Networks
    Bi, Ran
    Li, Jianzhong
    Gao, Hong
    ADVANCES IN WIRELESS SENSOR NETWORKS, 2015, 501 : 509 - 519
  • [23] A probability distribution-based point cloud clustering algorithm
    Yuan, Xia
    Zhao, Chun-xia
    Zhang, Hao-feng
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2012, 15 (04) : 320 - 330
  • [24] A dynamic data stream clustering algorithm based on probability and exemplar
    Bi A.
    Dong A.
    Wang S.
    1600, Science Press (53): : 1029 - 1042
  • [25] A CLUSTERING ROUTING ALGORITHM FOR SENSOR NETWORK BASED ON DISTANCE PROBABILITY
    Qian, Kai-Guo
    2013 10TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2013, : 113 - 116
  • [26] SAMPLING BASED APPROXIMATE SPECTRAL CLUSTERING ENSEMBLE FOR UNSUPERVISED LAND COVER IDENTIFICATION
    Moazzen, Yaser
    Yalcin, Berna
    Tacdemir, Kadim
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 2405 - 2408
  • [27] Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm
    Heinzl, Felix
    Tutz, Gerhard
    STATISTICAL MODELLING, 2013, 13 (01) : 41 - 67
  • [28] A new clustering algorithm based on connectivity
    Wan, Jiaqiang
    Zhang, Kesheng
    Guo, Zhenpeng
    Miao, Duoqian
    APPLIED INTELLIGENCE, 2023, 53 (17) : 20272 - 20292
  • [29] A new clustering algorithm based on connectivity
    Jiaqiang Wan
    Kesheng Zhang
    Zhenpeng Guo
    Duoqian Miao
    Applied Intelligence, 2023, 53 : 20272 - 20292
  • [30] Histogram Publishing Algorithm Based on Sampling Sorting and Greedy Clustering
    Wu, Xiaonian
    Tong, Nian
    Ye, Zhibo
    Wang, Yujue
    BLOCKCHAIN AND TRUSTWORTHY SYSTEMS, BLOCKSYS 2019, 2020, 1156 : 81 - 91