A Spark-based Artificial Bee Colony Algorithm for Large-scale Data Clustering

被引:4
|
作者
Wang, Yanjie [1 ]
Qian, Quan [1 ,2 ,3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai 200444, Peoples R China
[3] Shanghai Univ, Mat Genome Inst, Shanghai 200444, Peoples R China
基金
上海市自然科学基金;
关键词
Clustering; Artificial bee colony; Spark; OPTIMIZATION;
D O I
10.1109/HPCC/SmartCity/DSS.2018.00204
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most common data analysis methods which aims to partition data into a certain number of clusters, so that the data within the same cluster are similar and dissimilar from data in other clusters. Our research goal is to find more efficient clustering algorithms for large-scale data. Spark is the most popular distributed computing platform which provides a series of high-level API to make high-performance parallel applications. The Spark-based artificial bee algorithm proposed in this paper combines the robust artificial bee colony algorithm with the powerful Spark framework, which is very suitable for clustering large-scale data. To verify the effectiveness of this method, we adopt KDD CUP 99 data, an open competition dataset as the experimental data. The experimental results illustrate that our algorithm can get a good clustering quality and almost ideal speedup compared with the serial algorithms.
引用
收藏
页码:1213 / 1218
页数:6
相关论文
共 50 条
  • [1] A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification
    Al-Sawwa, Jamil
    Almseidin, Mohammad
    INFORMATION, 2022, 13 (11)
  • [2] A MapReduce-based artificial bee colony for large-scale data clustering
    Banharnsakun, Anan
    PATTERN RECOGNITION LETTERS, 2017, 93 : 78 - 84
  • [3] Spark-based Large-scale Matrix Inversion for Big Data Processing
    Liang, Yang
    Liu, Jun
    Fang, Cheng
    Ansari, Nirwan
    2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [4] Improved Artificial Bee Colony Algorithm for Large-Scale Optimization Problems
    Gocho, Ryuta
    Utani, Akihide
    Yamamoto, Hisao
    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 605 - 608
  • [5] Spark-Based Large-Scale Matrix Inversion for Big Data Processing
    Liu, Jun
    Liang, Yang
    Ansari, Nirwan
    IEEE ACCESS, 2016, 4 : 2166 - 2176
  • [6] Memetic Artificial Bee Colony Algorithm for Large-Scale Global Optimization
    Fister, Iztok
    Fister, Iztok, Jr.
    Brest, Janez
    Zumer, Viljem
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [7] Large-scale weapon-target allocation based on an artificial bee colony algorithm
    Zhou Y.
    Wang T.
    Chen L.
    Fu L.
    Wei Z.
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2024, 45 (06): : 1187 - 1195
  • [8] Many-objective artificial bee colony algorithm for large-scale software module clustering problem
    Amarjeet
    Chhabra, Jitender Kumar
    SOFT COMPUTING, 2018, 22 (19) : 6341 - 6361
  • [9] Many-objective artificial bee colony algorithm for large-scale software module clustering problem
    Jitender Kumar Amarjeet
    Soft Computing, 2018, 22 : 6341 - 6361
  • [10] A ranking paired based artificial bee colony algorithm for data clustering
    Xu, Haiping
    Dong, Zhengshan
    Xu, Meiqin
    Lin, Geng
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (04) : 389 - 398