An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

被引:2
|
作者
Madan, Suman [1 ]
Komalavalli, C. [2 ]
Bhatia, Manjot Kaur [1 ]
Laroiya, Chetna [1 ]
Arora, Monika [3 ]
机构
[1] Jagan Inst Management Studies, Dept Informat Technol, Sect 5, New Delhi, India
[2] Presidency Univ, Sch CSE & IS, Bangaluru, India
[3] Bhagwan Parshuram Inst Technol, Dept CSE, Delhi, India
关键词
Leader Harris Hawks Optimization; Recursive Feature Elimination; Support vector Machine; Entropy Weighted Power K-Means Clustering; Competitive Swarm Optimizer;
D O I
10.1007/s11042-023-18044-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.
引用
收藏
页码:74233 / 74254
页数:22
相关论文
共 50 条
  • [1] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [2] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [3] Feature Selection for SNP Data Based on SVM-RFE and AGA
    Yang, Xutao
    Wu, Yue
    Jia, Min
    Lei, Zhou
    Liu, Zongtian
    2011 AASRI CONFERENCE ON APPLIED INFORMATION TECHNOLOGY (AASRI-AIT 2011), VOL 1, 2011, : 204 - 208
  • [4] A Hybrid Feature Selection Based on Fisher Score and SVM-RFE for Microarray Data
    Hamla H.
    Ghanem K.
    Informatica (Slovenia), 2024, 48 (01): : 57 - 68
  • [5] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [6] A Hybrid Feature Selection Approach by Correlation-based Filters and SVM-RFE
    Zhang, Jing
    Hu, Xuegang
    Li, Peipei
    He, Wei
    Zhang, Yuhong
    Li, Huizong
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3684 - 3689
  • [7] A NEW FEATURE SELECTION METHOD BASED ON RELIEF AND SVM-RFE
    Fu Ruigang
    Wang Ping
    Gao Yinghui
    Hua Xiaoqiang
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 1363 - 1366
  • [8] Feature selection for tumor classification based on improved SVM-RFE
    Li, Hangeng
    Duan, Yanhua
    Li, Qingshou
    Ruan, Xiaogang
    PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 422 - 424
  • [9] Improving enzyme regulatory protein classification by means of SVM-RFE feature selection
    Fernandez-Lozano, Carlos
    Fernandez-Blanco, Enrique
    Dave, Kirtan
    Pedreira, Nieves
    Gestal, Marcos
    Dorado, Julian
    Munteanu, Cristian R.
    MOLECULAR BIOSYSTEMS, 2014, 10 (05) : 1063 - 1071
  • [10] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418