An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

被引:2
|
作者
Madan, Suman [1 ]
Komalavalli, C. [2 ]
Bhatia, Manjot Kaur [1 ]
Laroiya, Chetna [1 ]
Arora, Monika [3 ]
机构
[1] Jagan Inst Management Studies, Dept Informat Technol, Sect 5, New Delhi, India
[2] Presidency Univ, Sch CSE & IS, Bangaluru, India
[3] Bhagwan Parshuram Inst Technol, Dept CSE, Delhi, India
关键词
Leader Harris Hawks Optimization; Recursive Feature Elimination; Support vector Machine; Entropy Weighted Power K-Means Clustering; Competitive Swarm Optimizer;
D O I
10.1007/s11042-023-18044-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.
引用
收藏
页码:74233 / 74254
页数:22
相关论文
共 50 条
  • [41] Particle Swarm Optimization with K-means for Simultaneous Feature Selection and Data Clustering
    Prakash, Jay
    Singh, Pramod Kumar
    2015 SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MACHINE INTELLIGENCE (ISCMI), 2015, : 74 - 78
  • [42] A novel SVM-RFE based biomedical data processing approach: basic and beyond
    Yin, Zuyu
    Fei, Zhongyang
    Yang, Chengming
    Chen, Ao
    PROCEEDINGS OF THE IECON 2016 - 42ND ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2016, : 7143 - 7148
  • [43] Retraction Note: Entropy and sigmoid based K-means clustering and AGWO for effective big data handling
    Ramdas Vankdothu
    Mohd Abdul Hameed
    Raju Bhukya
    Gaurav Garg
    Multimedia Tools and Applications, 2024, 83 (39) : 87383 - 87383
  • [44] Hepatitis Detection using Random Forest based on SVM-RFE (Recursive Feature Elimination) Feature Selection and SMOTE
    Krisnabayu, Rifky Yunus
    Ridok, Achmad
    Budi, Agung Setia
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY, SIET 2021, 2021, : 151 - 156
  • [45] Optimized data fusion for K-means Laplacian clustering
    Yu, Shi
    Liu, Xinhai
    Tranchevent, Leon-Charles
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    BIOINFORMATICS, 2011, 27 (01) : 118 - 126
  • [46] Optimized Data Fusion for Kernel k-Means Clustering
    Yu, Shi
    Tranchevent, Leon-Charles
    Liu, Xinhai
    Glanzel, Wolfgang
    Suykens, Johan A. K.
    De Moor, Bart
    Moreau, Yves
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
  • [47] K-SVM: An Effective SVM Algorithm Based on K-means Clustering
    Yao, Yukai
    Liu, Yang
    Yu, Yongqing
    Xu, Hong
    Lv, Weiming
    Li, Zhao
    Chen, Xiaoyun
    JOURNAL OF COMPUTERS, 2013, 8 (10) : 2632 - 2639
  • [48] An Effective K-means Clustering Based SVM Algorithm
    Yao, YuKai
    Liu, Yang
    Li, Zhao
    Chen, XiaoYun
    MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1344 - 1348
  • [49] Design of Intelligent K-Means Based on Spark for Big Data Clustering
    Kusuma, Ilham
    Ma'sum, M. Anwar
    Habibie, Novian
    Jatmiko, Wisnu
    Suhartanto, Heru
    2016 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2016, : 89 - 95
  • [50] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
    Zhang Ya-ling
    Wang Ya-nan
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,