On k-means iterations and Gaussian clusters

被引：10

作者：

de Amorim, Renato Cordeiro ^{[1
]}

Makarenkov, Vladimir ^{[2
]}

机构：

[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England

[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada

来源：

NEUROCOMPUTING | 2023年 / 553卷

基金：

“创新英国”项目;

关键词：

Clustering; Feature selection; NUMBER;

D O I：

10.1016/j.neucom.2023.126547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.

引用

页数：10

共 50 条

[1] An expansion of X-means for automatically determining the optimal number of clusters : Progressive iterations of K-means and merging of the clusters
Ishioka, T
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2005, : 91 - 96
[2] Sorted K-Means Towards the Enhancement of K-Means to Form Stable Clusters
Arora, Preeti
Virmani, Deepali
Jindal, Himanshu
Sharma, Mritunjaya
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORKS, 2017, 508 : 479 - 486
[3] Experiments for the number of clusters in K-Means
Chiang, Mark Ming-Tso
Mirkin, Boris
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 395 - 405
[4] k-means Requires Exponentially Many Iterations Even in the Plane
Vattani, Andrea
PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY (SCG'09), 2009, : 324 - 332
[5] k-means Requires Exponentially Many Iterations Even in the Plane
Andrea Vattani
Discrete & Computational Geometry, 2011, 45 : 596 - 616
[6] k-means Requires Exponentially Many Iterations Even in the Plane
Vattani, Andrea
DISCRETE & COMPUTATIONAL GEOMETRY, 2011, 45 (04) : 596 - 616
[7] Gaussian Representations of K-Means Clusters: Case Study of Educational Process Mining of UCI
Ko, Yu-Chien
Fujita, Hamido
KNOWLEDGE INNOVATION THROUGH INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_20), 2020, 327 : 399 - 409
[8] Choosing the Number of Clusters in K-Means Clustering
Steinley, Douglas
Brusco, Michael J.
PSYCHOLOGICAL METHODS, 2011, 16 (03) : 285 - 297
[9] Setting the number of clusters in K-means clustering
Huh, MH
RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
[10] ASYMPTOTIC PROPERTIES OF BIVARIATE K-MEANS CLUSTERS
WONG, MA
COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1982, 11 (10): : 1155 - 1171

← 1 2 3 4 5 →