An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

被引：2

作者：

Madan, Suman ^{[1
]}

Komalavalli, C. ^{[2
]}

Bhatia, Manjot Kaur ^{[1
]}

Laroiya, Chetna ^{[1
]}

Arora, Monika ^{[3
]}

机构：

[1] Jagan Inst Management Studies, Dept Informat Technol, Sect 5, New Delhi, India

[2] Presidency Univ, Sch CSE & IS, Bangaluru, India

[3] Bhagwan Parshuram Inst Technol, Dept CSE, Delhi, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 30期

关键词：

Leader Harris Hawks Optimization; Recursive Feature Elimination; Support vector Machine; Entropy Weighted Power K-Means Clustering; Competitive Swarm Optimizer;

D O I：

10.1007/s11042-023-18044-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.

引用

页码：74233 / 74254

页数：22

共 50 条

[1] Optimized big data K-means clustering using MapReduce
Cui, Xiaoli
Zhu, Pingfei
Yang, Xin
Li, Keqiu
Ji, Changqing
JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
[2] Optimized big data K-means clustering using MapReduce
Xiaoli Cui
Pingfei Zhu
Xin Yang
Keqiu Li
Changqing Ji
The Journal of Supercomputing, 2014, 70 : 1249 - 1259
[3] Feature Selection for SNP Data Based on SVM-RFE and AGA
Yang, Xutao
Wu, Yue
Jia, Min
Lei, Zhou
Liu, Zongtian
2011 AASRI CONFERENCE ON APPLIED INFORMATION TECHNOLOGY (AASRI-AIT 2011), VOL 1, 2011, : 204 - 208
[4] A Hybrid Feature Selection Based on Fisher Score and SVM-RFE for Microarray Data
Hamla H.
Ghanem K.
Informatica (Slovenia), 2024, 48 (01): : 57 - 68
[5] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[6] A Hybrid Feature Selection Approach by Correlation-based Filters and SVM-RFE
Zhang, Jing
Hu, Xuegang
Li, Peipei
He, Wei
Zhang, Yuhong
Li, Huizong
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3684 - 3689
[7] A NEW FEATURE SELECTION METHOD BASED ON RELIEF AND SVM-RFE
Fu Ruigang
Wang Ping
Gao Yinghui
Hua Xiaoqiang
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 1363 - 1366
[8] Feature selection for tumor classification based on improved SVM-RFE
Li, Hangeng
Duan, Yanhua
Li, Qingshou
Ruan, Xiaogang
PROGRESS IN INTELLIGENCE COMPUTATION AND APPLICATIONS, PROCEEDINGS, 2007, : 422 - 424
[9] Improving enzyme regulatory protein classification by means of SVM-RFE feature selection
Fernandez-Lozano, Carlos
Fernandez-Blanco, Enrique
Dave, Kirtan
Pedreira, Nieves
Gestal, Marcos
Dorado, Julian
Munteanu, Cristian R.
MOLECULAR BIOSYSTEMS, 2014, 10 (05) : 1063 - 1071
[10] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
Li, Yongyi
Yang, Zhongqiang
Han, Kaixu
Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418

← 1 2 3 4 5 →