An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

被引：2

作者：

Madan, Suman ^{[1
]}

Komalavalli, C. ^{[2
]}

Bhatia, Manjot Kaur ^{[1
]}

Laroiya, Chetna ^{[1
]}

Arora, Monika ^{[3
]}

机构：

[1] Jagan Inst Management Studies, Dept Informat Technol, Sect 5, New Delhi, India

[2] Presidency Univ, Sch CSE & IS, Bangaluru, India

[3] Bhagwan Parshuram Inst Technol, Dept CSE, Delhi, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 30期

关键词：

Leader Harris Hawks Optimization; Recursive Feature Elimination; Support vector Machine; Entropy Weighted Power K-Means Clustering; Competitive Swarm Optimizer;

D O I：

10.1007/s11042-023-18044-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the digitalized world, efficient big data clustering is necessary for massive data generation. The clustering algorithm plays an important role in resolving the computational complexity. The big data arriving from various sources are being processed using the MapReduce framework (MRF) by the knowledge of the clustering algorithms. Moreover, the clustering algorithm is useful for mining the significant information from the dataset. Generally, there are several difficulties in applying the clustering approach to big data as its new challenges are based on computation cost and reasonable time. Hence, this research introduced the Competitive Jaya Leader Harris Hawks Optimization assisted Entropy Weighted Power K-Means Clustering (CJayaLHHO_EWPKMC) for big data clustering. In addition, the overall processing of the devised method for big data clustering is carried out in the MapReduce (MR) framework. In mapper, the feature selection is done using Support vector Machine-Recursive Feature Elimination (SVM-RFE) assisted Jaya Leader Harris Hawks Optimization (JayaLHHO). In the reducer, the big data clustering is established using the EWPKMC method, wherein the weight of EWPKMC is modified with the CJayaLHHO algorithm such that the clustering outcome is attained. The proposed method is scalable, simple, cost-effective, and able to integrate with other technologies. The experimental result portrays that the developed method attained a superior presentation than the conventional methods based on the clustering accuracy is 0.937, the Jaccard coefficient is 0.913, and the rand coefficient is 0.912.

引用

页码：74233 / 74254

页数：22

共 50 条

[41] Particle Swarm Optimization with K-means for Simultaneous Feature Selection and Data Clustering
Prakash, Jay
Singh, Pramod Kumar
2015 SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND MACHINE INTELLIGENCE (ISCMI), 2015, : 74 - 78
[42] A novel SVM-RFE based biomedical data processing approach: basic and beyond
Yin, Zuyu
Fei, Zhongyang
Yang, Chengming
Chen, Ao
PROCEEDINGS OF THE IECON 2016 - 42ND ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2016, : 7143 - 7148
[43] Retraction Note: Entropy and sigmoid based K-means clustering and AGWO for effective big data handling
Ramdas Vankdothu
Mohd Abdul Hameed
Raju Bhukya
Gaurav Garg
Multimedia Tools and Applications, 2024, 83 (39) : 87383 - 87383
[44] Hepatitis Detection using Random Forest based on SVM-RFE (Recursive Feature Elimination) Feature Selection and SMOTE
Krisnabayu, Rifky Yunus
Ridok, Achmad
Budi, Agung Setia
PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY, SIET 2021, 2021, : 151 - 156
[45] Optimized data fusion for K-means Laplacian clustering
Yu, Shi
Liu, Xinhai
Tranchevent, Leon-Charles
Glanzel, Wolfgang
Suykens, Johan A. K.
De Moor, Bart
Moreau, Yves
BIOINFORMATICS, 2011, 27 (01) : 118 - 126
[46] Optimized Data Fusion for Kernel k-Means Clustering
Yu, Shi
Tranchevent, Leon-Charles
Liu, Xinhai
Glanzel, Wolfgang
Suykens, Johan A. K.
De Moor, Bart
Moreau, Yves
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (05) : 1031 - 1039
[47] K-SVM: An Effective SVM Algorithm Based on K-means Clustering
Yao, Yukai
Liu, Yang
Yu, Yongqing
Xu, Hong
Lv, Weiming
Li, Zhao
Chen, Xiaoyun
JOURNAL OF COMPUTERS, 2013, 8 (10) : 2632 - 2639
[48] An Effective K-means Clustering Based SVM Algorithm
Yao, YuKai
Liu, Yang
Li, Zhao
Chen, XiaoYun
MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1344 - 1348
[49] Design of Intelligent K-Means Based on Spark for Big Data Clustering
Kusuma, Ilham
Ma'sum, M. Anwar
Habibie, Novian
Jatmiko, Wisnu
Suhartanto, Heru
2016 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2016, : 89 - 95
[50] An Improved Sampling K-means Clustering Algorithm Based on MapReduce
Zhang Ya-ling
Wang Ya-nan
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,

← 1 2 3 4 5 →