Distributed K-Means algorithm based on a Spark optimization sample

被引:0
|
作者
Feng, Yongan [1 ]
Zou, Jiapeng [1 ]
Liu, Wanjun [1 ]
Lv, Fu [1 ]
机构
[1] Liaoning Tech Univ, Huludao, Peoples R China
来源
PLOS ONE | 2024年 / 19卷 / 12期
关键词
D O I
10.1371/journal.pone.0308993
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To address the instability and performance issues of the classical K-Means algorithm when dealing with massive datasets, we propose SOSK-Means, an improved K-Means algorithm based on Spark optimization. SOSK-Means incorporates several key modifications to enhance the clustering process.Firstly, a weighted jump-bank approach is introduced to enable efficient random sampling and preclustering. By incorporating weights and jump pointers, this approach improves the quality of initial centers and reduces sensitivity to their selection. Secondly, we utilize a weighted max-min distance with variance to calculate distances, considering both weight and variance information. This enables SOSK-Means to identify clusters that are farther apart and denser, enhancing clustering accuracy. The selection of the best initial centers is performed using the mean square error criterion. This ensures that the initial centers better represent the distribution and structure of the dataset, leading to improved clustering performance. During the iteration process, a novel distance comparison method is employed to reduce computation time, optimizing the overall efficiency of the algorithm. Additionally, SOSK-Means incorporates a Directed Acyclic Graph (DAG) to optimize performance through distributed strategies, leveraging the capabilities of the Spark framework. Experimental results show that SOSK-Means significantly improves computational speed while maintaining high computational accuracy.
引用
收藏
页数:21
相关论文
共 50 条
  • [11] Improved K-Means Algorithm Based on Hybrid Rice Optimization Algorithm
    Liu, Chuan
    Wang, Chunzhi
    Hu, Jixiong
    Ye, Zhiwei
    PROCEEDINGS OF THE 2017 9TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS), VOL 2, 2017, : 788 - 791
  • [12] Optimization of K-means Clustering Algorithm Based on Hadoop Platform
    Duan, A. L.
    Xu, Z. X.
    Zhang, H. J.
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENVIRONMENTAL ENGINEERING (CSEE 2015), 2015, : 1195 - 1203
  • [13] Order Batch Optimization Based on Improved K-Means Algorithm
    Zu, Qiaohong
    Feng, Rui
    HUMAN CENTERED COMPUTING, 2019, 11956 : 700 - 705
  • [14] Optimization study on k value of K-means algorithm
    Institute of Computer Network System, Hefei University of Technology, Hefei 230009, China
    Xitong Gongcheng Lilum yu Shijian, 2006, 2 (97-101):
  • [15] Optimization of K-Means Algorithm: Ant Colony Optimization
    Reddy, T. Namratha
    Supreethi, K. P.
    2017 INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC), 2017, : 530 - 535
  • [16] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [17] The Clustering Algorithm Based on Improved Antlion Optimization Algorithm with K-Means Concepts
    Feng, Qing
    Pan, Jeng-Shyang
    Huang, Kuan-Chun
    Chu, Shu-Chuan
    ADVANCES IN INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2021 & FITAT 2021), VOL 2, 2022, 278 : 125 - 135
  • [18] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
  • [19] A New Algorithm for Clustering Based on Particle Swarm Optimization and K-means
    Dong, Jinxin
    Qi, Minyong
    2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, VOL IV, PROCEEDINGS, 2009, : 264 - 268
  • [20] An improved teaching-learning-based optimization algorithm based on K-means
    Huang, Xiangdong
    Xia, Shixiong
    Niu, Qiang
    Zhao, Zhijun
    Journal of Computational Information Systems, 2015, 11 (17): : 6327 - 6334