An empirical comparison of four initialization methods for the K-Means algorithm

被引:542
作者
Peña, JM [1 ]
Lozano, JA [1 ]
Larrañaga, P [1 ]
机构
[1] Univ Basque Country, Dept Comp Sci & Artificial Intelligence, Intelligent Syst Grp, E-20080 San Sebastian, Spain
关键词
K-Means algorithm; K-Means initialization; partitional clustering; genetic algorithms;
D O I
10.1016/S0167-8655(99)00069-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in the literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up tin terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the K-Means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization methods outperform the rest of the compared methods as they make the K-Means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the K-Means algorithm when using each of the four initialization methods. Our results suggest that the Kaufman initialization method induces to the K-Means algorithm a more desirable behaviour with respect to the convergence speed than the random initialization method. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1027 / 1040
页数:14
相关论文
共 29 条
[1]  
Anderberg M. R., 1973, CLUSTER ANAL APPL, DOI [10.1016/C2013-0-06161-0, DOI 10.1016/C2013-0-06161-0]
[2]  
[Anonymous], P 14 C UNC ART INT
[3]   MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[4]  
Bishop C. M., 1995, NEURAL NETWORKS PATT
[5]  
BRADLEY PS, 1998, P 15 INT C MACH LEAR, P91
[6]  
CHANDON JL, 1980, ANAL TYPOLOGIQUE
[7]  
CHEESEMAN P, 1995, ADV KNOWLEDGE DISCOV, P153
[8]  
Davis L, 1985, P 9 INT JOINT C ARTI, V1, P162
[9]   Iterative optimization and simplification of hierarchical clusterings [J].
Fisher, D .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :147-179
[10]  
Fisher D., 1992, P 9 INT WORKSH MACH, P163