On k-means iterations and Gaussian clusters

被引：10

作者：

de Amorim, Renato Cordeiro ^{[1
]}

Makarenkov, Vladimir ^{[2
]}

机构：

[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England

[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada

来源：

NEUROCOMPUTING | 2023年 / 553卷

基金：

“创新英国”项目;

关键词：

Clustering; Feature selection; NUMBER;

D O I：

10.1016/j.neucom.2023.126547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.

引用

页数：10

共 50 条

[31] Deep k-Means: Jointly clustering with k-Means and learning representations
Fard, Maziar Moradi
Thonet, Thibaut
Gaussier, Eric
PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
[32] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[33] Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters
Khan, Imran
Luo, Zongwei
Huang, Joshua Zhexue
Shahzad, Waseem
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1838 - 1853
[34] K and starting means for k-means algorithm
Fahim, Ahmed
JOURNAL OF COMPUTATIONAL SCIENCE, 2021, 55
[35] ATTRIBUTES SCALING FOR K-MEANS ALGORITHM CONTROLLED BY MISCLASSIFICATION OF ALL CLUSTERS
Siriseriwan, Wacharasak
Sinapiromsaran, Krung
THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 220 - 223
[36] Seed selection algorithm through K-means on optimal number of clusters
Kuntal Chowdhury
Debasis Chaudhuri
Arup Kumar Pal
Ashok Samal
Multimedia Tools and Applications, 2019, 78 : 18617 - 18651
[37] A method for determining optimal number of clusters based on K-means algorithm
Qin, Zhentao
Yang, Wunian
Qin, Z. (qzt2008@sina.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09): : 6123 - 6130
[38] Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs
Viloria, Amelec
Lezama, Omar Bonerge Pineda
10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1201 - 1206
[39] Improvement of K-means Cluster Quality by Post Processing Resulted Clusters
Borlea, Ioan-Daniel
Precup, Radu-Emil
Borlea, Alexandra-Bianca
8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 : 63 - 70
[40] Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
Li, Mark Junjie
Ng, Michael K.
Cheung, Yiu-ming
Huang, Joshua Zhexue
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1519 - 1534

← 1 2 3 4 5 →