On k-means iterations and Gaussian clusters

被引：10

作者：

de Amorim, Renato Cordeiro ^{[1
]}

Makarenkov, Vladimir ^{[2
]}

机构：

[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England

[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada

来源：

NEUROCOMPUTING | 2023年 / 553卷

基金：

“创新英国”项目;

关键词：

Clustering; Feature selection; NUMBER;

D O I：

10.1016/j.neucom.2023.126547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.

引用

页数：10

共 50 条

[21] K-means - a fast and efficient K-means algorithms
Nguyen C.D.
Duong T.H.
Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45
[22] Extending the k-means Clustering Algorithm to Improve the Compactness of the Clusters
Nasiakou, Antonia
Alamaniotis, Miltiadis
Tsoukalas, Lefteri H.
JOURNAL OF PATTERN RECOGNITION RESEARCH, 2016, 11 (01): : 61 - 73
[23] Slitting K-means clusters to X-means clusters for Prolonging Wireless Sensor Networks Lifetime
Radwan, Abdelrahman
Kamarudin, Nazhatul Hafizah
Solihin, Mahmud Iwan
Leong, Hungyang
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
[24] Refining K-means Algorithm by Detecting Superfluous and Oversized Clusters
Gumus, Hakan
Sevilgen, Fatih Erdogan
WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 437 - 442
[25] K-normal: An Improved K-means for Dealing with Clusters of Different Sizes
Lu, Yonggang
Qiao, Jiangang
Wang, Xiaochun
INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 335 - 344
[26] Exact Acceleration of K-Means plus plus and K-Means∥
Raff, Edward
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
[27] Parametrized means and limit properties of their Gaussian iterations
Jarczyk, Justyna
APPLIED MATHEMATICS AND COMPUTATION, 2015, 261 : 81 - 89
[28] K-Means Cloning: Adaptive Spherical K-Means Clustering
Hedar, Abdel-Rahman
Ibrahim, Abdel-Monem M.
Abdel-Hakim, Alaa E.
Sewisy, Adel A.
ALGORITHMS, 2018, 11 (10):
[29] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
Banerjee, Shreya
Choudhary, Ankit
Pal, Somnath
2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
[30] In search of deterministic methods for initializing K-means and Gaussian mixture clustering
Su, Ting
Dy, Jennifer G.
INTELLIGENT DATA ANALYSIS, 2007, 11 (04) : 319 - 338

← 1 2 3 4 5 →