An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

被引:2
|
作者
Madan, Suman [1 ]
Komalavalli, C. [2 ]
Bhatia, Manjot Kaur [1 ]
Laroiya, Chetna [1 ]
Arora, Monika [3 ]
机构
[1] Jagan Inst Management Studies, Dept Informat Technol, Sect 5, New Delhi, India
[2] Presidency Univ, Sch CSE & IS, Bangaluru, India
[3] Bhagwan Parshuram Inst Technol, Dept CSE, Delhi, India
关键词
Leader Harris Hawks Optimization; Recursive Feature Elimination; Support vector Machine; Entropy Weighted Power K-Means Clustering; Competitive Swarm Optimizer;
D O I
10.1007/s11042-023-18044-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.
引用
收藏
页码:74233 / 74254
页数:22
相关论文
共 50 条
  • [31] K-means Clustering Optimization Algorithm Based on MapReduce
    Li, Zhihua
    Song, Xudong
    Zhu, Wenhui
    Chen, Yanxia
    PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 198 - 203
  • [32] A Novel K-Means based Clustering Algorithm for Big Data
    Sinha, Ankita
    Jana, Prasanta K.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879
  • [33] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [34] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [35] Multi-scoring Feature selection method based on SVM-RFE for prostate cancer diagnosis
    Albashish, Dheeb
    Sahran, Shahnorbanun
    Abdullah, Azizi
    Adam, Afzan
    Abd Shukor, Nordashima
    Pauzi, Suria Hayati Md
    5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 682 - 686
  • [36] A Novel Stability Based Feature Selection Framework for k-means Clustering
    Mavroeidis, Dimitrios
    Marchiori, Elena
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2011, 6912 : 421 - 436
  • [37] Rough Entropy Based k-Means Clustering
    Malyszko, Dariusz
    Stepaniuk, Jaroslaw
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, PROCEEDINGS, 2009, 5908 : 406 - 413
  • [38] On feature selection and blast furnace temperature tendency prediction in hot metal based on SVM-RFE
    Wang, Yi-Kang
    Liu, Xue-Yi
    Zhang, Bao-Lin
    2018 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2018, : 371 - 376
  • [39] Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading
    Sahran, Shahnorbanun
    Albashish, Dheeb
    Abdullah, Azizi
    Abd Shukor, Nordashima
    Pauzi, Suria Hayati Md
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2018, 87 : 78 - 90
  • [40] Entropy Based Soft K-means Clustering
    Bai, Xue
    Luo, Siwei
    Zhao, Yibiao
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 107 - 110