On k-means iterations and Gaussian clusters

被引:10
|
作者
de Amorim, Renato Cordeiro [1 ]
Makarenkov, Vladimir [2 ]
机构
[1] Univ Essex, Sch Comp Sci & Elect Engn, Wivenhoe, England
[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
基金
“创新英国”项目;
关键词
Clustering; Feature selection; NUMBER;
D O I
10.1016/j.neucom.2023.126547
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (& tau;) is in fact very informative. We find this to be particularly interesting because & tau; is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that & tau; is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower & tau; than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between & tau; and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which & tau; can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between & tau; and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] K-means - a fast and efficient K-means algorithms
    Nguyen C.D.
    Duong T.H.
    Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45
  • [22] Extending the k-means Clustering Algorithm to Improve the Compactness of the Clusters
    Nasiakou, Antonia
    Alamaniotis, Miltiadis
    Tsoukalas, Lefteri H.
    JOURNAL OF PATTERN RECOGNITION RESEARCH, 2016, 11 (01): : 61 - 73
  • [23] Slitting K-means clusters to X-means clusters for Prolonging Wireless Sensor Networks Lifetime
    Radwan, Abdelrahman
    Kamarudin, Nazhatul Hafizah
    Solihin, Mahmud Iwan
    Leong, Hungyang
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC DEVICES, SYSTEMS AND APPLICATIONS (ICEDSA2020), 2020, 2306
  • [24] Refining K-means Algorithm by Detecting Superfluous and Oversized Clusters
    Gumus, Hakan
    Sevilgen, Fatih Erdogan
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 437 - 442
  • [25] K-normal: An Improved K-means for Dealing with Clusters of Different Sizes
    Lu, Yonggang
    Qiao, Jiangang
    Wang, Xiaochun
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 335 - 344
  • [26] Exact Acceleration of K-Means plus plus and K-Means∥
    Raff, Edward
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2928 - 2935
  • [27] Parametrized means and limit properties of their Gaussian iterations
    Jarczyk, Justyna
    APPLIED MATHEMATICS AND COMPUTATION, 2015, 261 : 81 - 89
  • [28] K-Means Cloning: Adaptive Spherical K-Means Clustering
    Hedar, Abdel-Rahman
    Ibrahim, Abdel-Monem M.
    Abdel-Hakim, Alaa E.
    Sewisy, Adel A.
    ALGORITHMS, 2018, 11 (10):
  • [29] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
    Banerjee, Shreya
    Choudhary, Ankit
    Pal, Somnath
    2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
  • [30] In search of deterministic methods for initializing K-means and Gaussian mixture clustering
    Su, Ting
    Dy, Jennifer G.
    INTELLIGENT DATA ANALYSIS, 2007, 11 (04) : 319 - 338