Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments

被引:0
|
作者
Wang, Huan [1 ]
机构
[1] Southwest Forestry Univ, Coll Big Data & Intelligence Engn, Kunming 650224, Yunnan, Peoples R China
关键词
-Random forest; SVM; machine learning; big data; feature selection; best-first search; rough set theory;
D O I
10.14569/IJACSA.2023.0141054
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
high-dimensional big data presents substantial challenges for Machine Learning (ML) algorithms, mainly due to the curse of dimensionality that leads to computational inefficiencies and increased risk of overfitting. Various dimensionality reduction and Feature Selection (FS) techniques have been developed to alleviate these challenges. Random Forest (RF), a widely-used Ensemble Learning Method (ELM), is recognized for its high accuracy and robustness, including its lesser-known capability for effective FS. While specialized RF models are designed for FS, they often struggle with computational efficiency on large datasets. Addressing these challenges, this study proposes a novel Feature Selection Model (FSM) integrated with data reduction techniques, termed Dynamic Correlated Regularized Random Forest (DCRRF). The architecture operates in four phases: Preprocessing, Feature Reduction (FR) using Best-First Search with Rough Set Theory (BFS-RST), FS through DCRRF, and feature efficacy assessment using a Support Vector Machine (SVM) classifier. Benchmarked against four gene expression datasets, the proposed model outperforms existing RF-based methods in computational efficiency and classification accuracy. This study introduces a robust and efficient approach to feature selection in high-dimensional big-data scenarios.
引用
收藏
页码:505 / 518
页数:14
相关论文
共 50 条
  • [31] A random forest algorithm under the ensemble approach for feature selection and classification
    Kharwar, Ankit
    Thakor, Devendra
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2023, 29 (04) : 426 - 447
  • [32] Random Forest-Based Identification of Factors Influencing Ground Deformation Due to Mining Seismicity
    Owczarz, Karolina
    Blachowski, Jan
    REMOTE SENSING, 2024, 16 (15)
  • [33] Software Defect Prediction using Feature Selection and Random Forest Algorithm
    Ibrahim, Dyana Rashid
    Ghnemat, Rawan
    Hudaib, Amjad
    2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 252 - 257
  • [34] Research on Recognition Technology of Human Lower Limbs Feature Based on the Random Forest Algorithm
    Liu, Yankai
    Yu, Meijuan
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2018, 423 : 709 - 714
  • [35] Data mining algorithm based on feature weighting
    Qian, Zheng
    Xia, Hongxia
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2019, 19 (S1) : S269 - S276
  • [36] Performance Analysis of Feature Selection Algorithm for Educational Data Mining
    Zaffar, Maryam
    Hashmani, Manzoor Ahmed
    Savita, K. S.
    2017 IEEE CONFERENCE ON BIG DATA AND ANALYTICS (ICBDA), 2017, : 7 - 12
  • [37] Random Forest-Based Feature Importance for HEp-2 Cell Image Classification
    Gupta, Vibha
    Bhavsar, Arnav
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (MIUA 2017), 2017, 723 : 922 - 934
  • [38] An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring
    Li, Guo
    Wang, Chensheng
    Zhang, Di
    Yang, Guang
    SENSORS, 2021, 21 (16)
  • [39] Transformer Fault Diagnosis Based on the Improved Sparrow Search Algorithm and Random Forest Feature Selection
    Chen, Xi
    Ji, Ning
    Qin, Xue
    Zhang, Mengmeng
    Chen, Xueming
    Jiang, Chenlu
    Tao, Kai
    2024 3RD INTERNATIONAL CONFERENCE ON ENERGY AND ELECTRICAL POWER SYSTEMS, ICEEPS 2024, 2024, : 1086 - 1091
  • [40] Stable Feature Selection with Privacy Preserving Data Mining Algorithm
    Chelvan, Mohana P.
    Perumal, K.
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2017, 2017, 712 : 227 - 237