A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
|
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 50 条
  • [21] Clustering Algorithm Based on Artificial Bee Colony Optimization
    Zhang, Dandan
    Luo, Ke
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 126 - 131
  • [22] A Clustering-Based Artificial Bee Colony Algorithm
    Zhang, Ming
    Tian, Na
    Ji, Zhicheng
    Wang, Yan
    THEORY, METHODOLOGY, TOOLS AND APPLICATIONS FOR MODELING AND SIMULATION OF COMPLEX SYSTEMS, PT I, 2016, 643 : 101 - 109
  • [23] Automatic Data Clustering Based Mean Best Artificial Bee Colony Algorithm
    Alrosan, Ayat
    Alomoush, Waleed
    Alswaitti, Mohammed
    Alissa, Khalid
    Sahran, Shahnorbanun
    Makhadmeh, Sharif Naser
    Alieyan, Kamal
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (02): : 1575 - 1593
  • [24] Global Artificial Bee Colony Search Algorithm for Data Clustering
    Danish, Zeeshan
    Shah, Habib
    Tairan, Nasser
    Ghazali, Rozaida
    Badshah, Akhtar
    INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH, 2019, 10 (02) : 48 - 59
  • [25] Distributed Interference Optimization Method of Large-scale UAV Based on Tabu Search Artificial Bee Colony Algorithm
    Li, Haobo
    Zhang, Yi
    Dan, Zesheng
    Ma, Lejing
    Zhang, Cunle
    Wang, Quanquan
    2022 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2022, 2022,
  • [26] Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System
    Zhang, Xinming
    Yang, YaoHua
    Shen, Li
    2017 13TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2017), 2017, : 69 - 76
  • [27] A spark-based method for identifying large-scale network burst traffic
    Sun, Yu-Lu
    Yun, Ben-Sheng
    Qian, Ya-Guan
    Feng, Jun
    Journal of Computers (Taiwan), 2021, 32 (04) : 123 - 136
  • [28] Fuzzy clustering with artificial bee colony algorithm
    Karaboga, Dervis
    Ozturk, Celal
    SCIENTIFIC RESEARCH AND ESSAYS, 2010, 5 (14): : 1899 - 1902
  • [29] Energy efficient data collection with multiple mobile sink using artificial bee colony algorithm in large-scale WSN
    Vijayashree, R.
    Dhas, C. Suresh Ghana
    AUTOMATIKA, 2019, 60 (05) : 555 - 563
  • [30] A Novel Hybrid Data Clustering Algorithm Based on Artificial Bee Colony Algorithm and K-Means
    Tran Dang Cong
    Wu Zhijian
    Wang Zelin
    Deng Changshou
    CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (04) : 694 - 701