An Efficient Global K-means Clustering Algorithm

被引:74
|
作者
Xie, Juanying [1 ,2 ]
Jiang, Shuai [2 ]
Xie, Weixin [1 ,3 ,4 ]
Gao, Xinbo [5 ,6 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China
[2] Shaanxi Normal Univ, Sch Comp Sci, Xian 710062, Shaanxi, Peoples R China
[3] Shenzhen Univ, Natl Lab Automat Target Recognit ATR, Shenzhen 518001, Peoples R China
[4] Shenzhen Univ, Coll Informat Engn, Shenzhen 518001, Peoples R China
[5] Xidian Univ, Sch Elect Engn, VIPS Lab, Xian 710071, Peoples R China
[6] Xidian Univ, Minist Educ China, Key Lab Intelligent Percept & Image Understanding, Xian 710071, Peoples R China
关键词
clustering; K-means clustering; global K-means clustering; machine learning; pattern recognition; data mining; non-smooth optimization;
D O I
10.4304/jcp.6.2.271-279
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
K-means clustering is a popular clustering algorithm based on the partition of data. However, K-means clustering algorithm suffers from some shortcomings, such as its requiring a user to give out the number of clusters at first, and its sensitiveness to initial conditions, and its being easily trapped into a local solution et cetera. The global K-means algorithm proposed by Likas et al is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) runs of the K-means algorithm from suitable initial positions. It avoids the depending on any initial conditions or parameters, and considerably outperforms the K-means algorithms, but it has a heavy computational load. In this paper, we propose a new version of the global K-means algorithm. That is an efficient global K-means clustering algorithm. The outstanding feature of our algorithm is its superiority in execution time. It takes less run time than that of the available global K-means algorithms do. In this algorithm we modified the way of finding the optimal initial center of the next new cluster by defining a new function as the criterion to select the optimal candidate center for the next new cluster. Our idea grew under enlightened by Park and Jun's idea of K-medoids clustering algorithm. We chose the best candidate initial center for the next cluster by calculating the value of our new function which uses the information of the natural distribution of data, so that the optimal initial center we chose is the point which is not only with the highest density, but also apart from the available cluster centers. Experiments on fourteen well-known data sets from UCI machine learning repository show that our new algorithm can significantly reduce the computational time without affecting the performance of the global K-means algorithms. Further experiments demonstrate that our improved global K-means algorithm outperforms the global K-means algorithm greatly and is suitable for clustering large data sets. Experiments on colon cancer tissue data set revealed that our new global K-means algorithm can efficiently deal with gene expression data with high dimensions. And experiment results on synthetic data sets with different proportions noisy data points prove that our global k-means can avoid the influence of noisy data on clustering results efficiently.
引用
收藏
页码:271 / 279
页数:9
相关论文
共 50 条
  • [41] K-Means Clustering Efficient Algorithm with Initial Class Center Selection
    Huang Suyu
    Hu Pingfang
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL WORKSHOP ON MATERIALS ENGINEERING AND COMPUTER SCIENCES (IWMECS 2018), 2018, 78 : 301 - 305
  • [42] A comparative study of efficient initialization methods for the k-means clustering algorithm
    Celebi, M. Emre
    Kingravi, Hassan A.
    Vela, Patricio A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) : 200 - 210
  • [43] A time efficient pattern reduction algorithm for k-means based clustering
    Tsai, Chun-Wei
    Yang, Chu-Sing
    Chiang, Ming-Chao
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 209 - +
  • [44] An Efficient Data Structure for Document Clustering Using K-Means Algorithm
    Killani, Ramanji
    Satapathy, Suresh Chandra
    Sowjanya, A. M.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 337 - +
  • [45] A time-efficient pattern reduction algorithm for k-means clustering
    Chiang, Ming-Chao
    Tsai, Chun-Wei
    Yang, Chu-Sing
    INFORMATION SCIENCES, 2011, 181 (04) : 716 - 731
  • [46] An Efficient Dimension Reduction Technique for Basic K-Means Clustering Algorithm
    Usman, Dauda
    Mohamad, Ismail
    MATEMATIKA, 2013, 29 (01) : 253 - 267
  • [47] eXploratory K-Means: A new simple and efficient algorithm for gene clustering
    Lam, Yau King
    Tsang, Peter W. M.
    APPLIED SOFT COMPUTING, 2012, 12 (03) : 1149 - 1157
  • [48] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [49] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [50] The global Minmax k-means algorithm
    Wang, Xiaoyan
    Bai, Yanping
    SPRINGERPLUS, 2016, 5