On k-means iterations and Gaussian clusters

被引:10
|
作者
de Amorim, Renato Cordeiro [1 ]
Makarenkov, Vladimir [2 ]
机构
[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England
[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
基金
“创新英国”项目;
关键词
Clustering; Feature selection; NUMBER;
D O I
10.1016/j.neucom.2023.126547
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] An expansion of X-means for automatically determining the optimal number of clusters : Progressive iterations of K-means and merging of the clusters
    Ishioka, T
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2005, : 91 - 96
  • [2] Sorted K-Means Towards the Enhancement of K-Means to Form Stable Clusters
    Arora, Preeti
    Virmani, Deepali
    Jindal, Himanshu
    Sharma, Mritunjaya
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORKS, 2017, 508 : 479 - 486
  • [3] Experiments for the number of clusters in K-Means
    Chiang, Mark Ming-Tso
    Mirkin, Boris
    PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4874 : 395 - 405
  • [4] k-means Requires Exponentially Many Iterations Even in the Plane
    Vattani, Andrea
    PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY (SCG'09), 2009, : 324 - 332
  • [5] k-means Requires Exponentially Many Iterations Even in the Plane
    Andrea Vattani
    Discrete & Computational Geometry, 2011, 45 : 596 - 616
  • [6] k-means Requires Exponentially Many Iterations Even in the Plane
    Vattani, Andrea
    DISCRETE & COMPUTATIONAL GEOMETRY, 2011, 45 (04) : 596 - 616
  • [7] Gaussian Representations of K-Means Clusters: Case Study of Educational Process Mining of UCI
    Ko, Yu-Chien
    Fujita, Hamido
    KNOWLEDGE INNOVATION THROUGH INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_20), 2020, 327 : 399 - 409
  • [8] Choosing the Number of Clusters in K-Means Clustering
    Steinley, Douglas
    Brusco, Michael J.
    PSYCHOLOGICAL METHODS, 2011, 16 (03) : 285 - 297
  • [9] Setting the number of clusters in K-means clustering
    Huh, MH
    RECENT ADVANCES IN STATISTICAL RESEARCH AND DATA ANALYSIS, 2002, : 115 - 124
  • [10] ASYMPTOTIC PROPERTIES OF BIVARIATE K-MEANS CLUSTERS
    WONG, MA
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1982, 11 (10): : 1155 - 1171