On k-means iterations and Gaussian clusters

被引:10
|
作者
de Amorim, Renato Cordeiro [1 ]
Makarenkov, Vladimir [2 ]
机构
[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England
[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
基金
“创新英国”项目;
关键词
Clustering; Feature selection; NUMBER;
D O I
10.1016/j.neucom.2023.126547
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [32] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [33] Variable Weighting in Fuzzy k-Means Clustering to Determine the Number of Clusters
    Khan, Imran
    Luo, Zongwei
    Huang, Joshua Zhexue
    Shahzad, Waseem
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (09) : 1838 - 1853
  • [34] K and starting means for k-means algorithm
    Fahim, Ahmed
    JOURNAL OF COMPUTATIONAL SCIENCE, 2021, 55
  • [35] ATTRIBUTES SCALING FOR K-MEANS ALGORITHM CONTROLLED BY MISCLASSIFICATION OF ALL CLUSTERS
    Siriseriwan, Wacharasak
    Sinapiromsaran, Krung
    THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 220 - 223
  • [36] Seed selection algorithm through K-means on optimal number of clusters
    Kuntal Chowdhury
    Debasis Chaudhuri
    Arup Kumar Pal
    Ashok Samal
    Multimedia Tools and Applications, 2019, 78 : 18617 - 18651
  • [37] A method for determining optimal number of clusters based on K-means algorithm
    Qin, Zhentao
    Yang, Wunian
    Qin, Z. (qzt2008@sina.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09): : 6123 - 6130
  • [38] Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs
    Viloria, Amelec
    Lezama, Omar Bonerge Pineda
    10TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2019) / THE 2ND INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40 2019) / AFFILIATED WORKSHOPS, 2019, 151 : 1201 - 1206
  • [39] Improvement of K-means Cluster Quality by Post Processing Resulted Clusters
    Borlea, Ioan-Daniel
    Precup, Radu-Emil
    Borlea, Alexandra-Bianca
    8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 : 63 - 70
  • [40] Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
    Li, Mark Junjie
    Ng, Michael K.
    Cheung, Yiu-ming
    Huang, Joshua Zhexue
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1519 - 1534