Improved k-means clustering algorithm and its applications

被引:1
|
作者
Qi H. [1 ,2 ]
Li J. [2 ]
Di X. [1 ,2 ]
Ren W. [1 ,2 ]
Zhang F. [3 ]
机构
[1] National and Local Joint Engineering Research Center of Space and Optoelectronics Technology, Changchun University of Science and Technology, Changchun
[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun
[3] Northeast Normal University, Changchun
基金
中国国家社会科学基金;
关键词
Algorithm; Applications; Clustering; GPS; K-means; Network attack detection;
D O I
10.2174/1872212113666181203110611
中图分类号
学科分类号
摘要
Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as two-dimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest vari-ance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for high-dimensional data. © 2019 Bentham Science Publishers.
引用
收藏
页码:403 / 409
页数:6
相关论文
共 50 条
  • [41] K-means Clustering Algorithm based on Improved Density Peak
    Wei, Debin
    Zhang, Zhenxing
    ACM International Conference Proceeding Series, 2023, : 105 - 109
  • [42] Video Classification Based On the Improved K-Means Clustering Algorithm
    Peng, Taile
    Zhang, Zhen
    Shen, Ke
    Jiang, Tao
    2019 5TH INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION, 2020, 440
  • [43] An improved K-means clustering algorithm in agricultural image segmentation
    Cheng, Huifeng
    Peng, Hui
    Liu, Shanmei
    PIAGENG 2013: IMAGE PROCESSING AND PHOTONICS FOR AGRICULTURAL ENGINEERING, 2013, 8761
  • [44] An Improved PTAS approximation Algorithm for k-means Clustering Problem
    Wang Shouqiang
    2012 2ND INTERNATIONAL CONFERENCE ON UNCERTAINTY REASONING AND KNOWLEDGE ENGINEERING (URKE), 2012, : 90 - 94
  • [45] Load Forecasting Based on Improved K-means Clustering Algorithm
    Wang Yanbo
    Liu Li
    Pang Xinfu
    Fan Enpeng
    2018 CHINA INTERNATIONAL CONFERENCE ON ELECTRICITY DISTRIBUTION (CICED), 2018, : 2751 - 2755
  • [46] Improved initial clustering center selection algorithm for K-means
    Chen Lasheng
    Li Yuqiang
    2017 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2017), 2017, : 275 - 279
  • [47] An Improved K-means Clustering Algorithm Based on Hadoop Platform
    Hou, Xiangru
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 1101 - 1109
  • [48] Application of Improved K-means Clustering Algorithm in Customer Segmentation
    Li, Gang
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 1081 - 1084
  • [49] An Improved K-means Clustering Algorithm Based on Normal Matrix
    Tian Shengwen
    Zhao Yongsheng
    Wang Yilei
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION AND INSTRUMENTATION, VOL 4, 2008, : 2182 - 2185
  • [50] An Improved K-Means Clustering Algorithm Based on Spectral Method
    Tian, Shengwen
    Yang, Hongyong
    Wang, Yilei
    Li, Ali
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2008, 5370 : 530 - 536