Improved k-means clustering algorithm and its applications

被引：1

作者：

Qi H. ^{[1
,2
]}

Li J. ^{[2
]}

Di X. ^{[1
,2
]}

Ren W. ^{[1
,2
]}

Zhang F. ^{[3
]}

机构：

[1] National and Local Joint Engineering Research Center of Space and Optoelectronics Technology, Changchun University of Science and Technology, Changchun

[2] School of Computer Science and Technology, Changchun University of Science and Technology, Changchun

[3] Northeast Normal University, Changchun

来源：

Recent Patents on Engineering | 2019年 / 13卷 / 04期

基金：

中国国家社会科学基金;

关键词：

Algorithm; Applications; Clustering; GPS; K-means; Network attack detection;

D O I：

10.2174/1872212113666181203110611

中图分类号：

学科分类号：

摘要：

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as two-dimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest vari-ance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for high-dimensional data. © 2019 Bentham Science Publishers.

引用

页码：403 / 409

页数：6

共 50 条

[21] An Improved Genetic K-Means Algorithm for Spatial Clustering
Wang, Yuanni
Ge, Fei
PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, 2008, : 123 - 126
[22] An Improved K-means Clustering Algorithm for Complex Networks
Li, Hao
Wang, Haoxiang
Chen, Zengxian
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND ELECTRONIC TECHNOLOGY, 2015, 3 : 90 - 93
[23] Improved K-means clustering algorithm in intrusion detection
Xiao, ShiSong
Li, XiaoXu
Liu, XueJiao
2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 2, 2008, : 771 - 775
[24] Improved K-Means algorithm in text semantic clustering
Ma, Junhong
Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534
[25] An improved k-means clustering algorithm for the community discovery
JiangYan, Sun
Journal of Software Engineering, 2015, 9 (02): : 242 - 253
[26] Design and Implementation of an Improved K-Means Clustering Algorithm
Zhao, Huiling
MOBILE INFORMATION SYSTEMS, 2022, 2022
[27] An Improved K-means Clustering Algorithm Based on Dissimilarity
Wang Shunye
PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2629 - 2633
[28] A novel k-means clustering algorithm and its application
Peng, Yingying
Li, Kenli
Li, Man
Journal of Computational and Theoretical Nanoscience, 2015, 12 (10) : 3658 - 3661
[29] A K-means Optimized Clustering Algorithm Based on Improved Genetic Algorithm
Pu, Qiu-Mei
Wu, Qiong
Li, Qian
Lecture Notes in Electrical Engineering, 2022, 801 LNEE : 133 - 140
[30] Improved rough K-means clustering algorithm based on firefly algorithm
Ye, Tingyu
Ye, Jun
Wang, Lei
INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2023, 17 (01) : 1 - 12

← 1 2 3 4 5 →