A new linear approximate clustering algorithm based upon sampling with probability distributing

被引：0

作者：

Yuan, CA ^{[1
]}

Tang, CJ ^{[1
]}

Li, C ^{[1
]}

Hu, JJ ^{[1
]}

Peng, J ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp, Chengdu 610064, Sichuan, Peoples R China

来源：

Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9 | 2005年

关键词：

k-median algorithm; clustering; probability distributing; Hash function; sampling;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is an important research direction in knowledge discovery. As the classical method in Clustering, the k-median algorithm is with serious deficiency such as low efficiency, bad adaptability for large data set etc. To solve this problem, a new method named LCPD (Linear Clustering Based on Probability Distributing) is proposed in this paper. The main contribution includes:(l) Partitions the buckets by using the space of equal probability in the m-dimension super-cube to make the number of data items in each layer ( namely the bucket of Hash) approximate equal, gets the layering sampling with the small cost; (2) The samples under the new algorithms is with sufficient representative power for total data set; (3)Proves that the complexity of the new algorithm is O(n); (4) By the comparing experiment shows that the performance of LCPD is 2 magnitude higher than traditional with the number of data set near to 10000, and the clustering quantity is increase 55% with number of data set near to 8000.

引用

页码：1518 / 1523

页数：6

共 50 条

[1] A Nystrom spectral clustering algorithm based on probability incremental sampling
Jia, Hongjie
Ding, Shifei
Du, Mingjing
SOFT COMPUTING, 2017, 21 (19) : 5815 - 5827
[2] A Nyström spectral clustering algorithm based on probability incremental sampling
Hongjie Jia
Shifei Ding
Mingjing Du
Soft Computing, 2017, 21 : 5815 - 5827
[3] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
Panthadeep BHATTACHARJEE
Pinaki MITRA
Frontiers of Computer Science, 2021, (02) : 36 - 38
[4] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
Panthadeep Bhattacharjee
Pinaki Mitra
Frontiers of Computer Science, 2021, 15
[5] iMass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity
Bhattacharjee, Panthadeep
Mitra, Pinaki
FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (02)
[6] A NEW DENSITY BASED SAMPLING TO ENHANCE DBSCAN CLUSTERING ALGORITHM
Al-mamory, Safaa O.
Kamil, Israa S.
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2019, 32 (04) : 315 - 327
[7] DENDIS: A new density-based sampling for clustering algorithm
Ros, Frederic
Guillaume, Serge
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 56 : 349 - 359
[8] A linear time algorithm for approximate 2-means clustering
Sabharwal, Y
Sen, S
COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2005, 32 (02): : 159 - 172
[9] HC_AB: A new heuristic clustering algorithm based on Approximate Backbone
Zong, Yu
Xu, Guandong
Jin, Ping
Zhang, Yanchun
Chen, Enhong
INFORMATION PROCESSING LETTERS, 2011, 111 (17) : 857 - 863
[10] Sampling based approximate spectral clustering ensemble for partitioning datasets
Moazzen, Yaser
Tasdemir, Kadim
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1630 - 1635

← 1 2 3 4 5 →