Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments

被引:0
|
作者
Wang, Huan [1 ]
机构
[1] Southwest Forestry Univ, Coll Big Data & Intelligence Engn, Kunming 650224, Yunnan, Peoples R China
关键词
-Random forest; SVM; machine learning; big data; feature selection; best-first search; rough set theory;
D O I
10.14569/IJACSA.2023.0141054
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
high-dimensional big data presents substantial challenges for Machine Learning (ML) algorithms, mainly due to the curse of dimensionality that leads to computational inefficiencies and increased risk of overfitting. Various dimensionality reduction and Feature Selection (FS) techniques have been developed to alleviate these challenges. Random Forest (RF), a widely-used Ensemble Learning Method (ELM), is recognized for its high accuracy and robustness, including its lesser-known capability for effective FS. While specialized RF models are designed for FS, they often struggle with computational efficiency on large datasets. Addressing these challenges, this study proposes a novel Feature Selection Model (FSM) integrated with data reduction techniques, termed Dynamic Correlated Regularized Random Forest (DCRRF). The architecture operates in four phases: Preprocessing, Feature Reduction (FR) using Best-First Search with Rough Set Theory (BFS-RST), FS through DCRRF, and feature efficacy assessment using a Support Vector Machine (SVM) classifier. Benchmarked against four gene expression datasets, the proposed model outperforms existing RF-based methods in computational efficiency and classification accuracy. This study introduces a robust and efficient approach to feature selection in high-dimensional big-data scenarios.
引用
收藏
页码:505 / 518
页数:14
相关论文
共 50 条
  • [11] Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression
    Jaiswal, Jitendra Kumar
    Samikannu, Rita
    2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 65 - 68
  • [12] Robustness of Random Forest-based gene selection methods
    Kursa, Miron Bartosz
    BMC BIOINFORMATICS, 2014, 15
  • [13] Robustness of Random Forest-based gene selection methods
    Miron Bartosz Kursa
    BMC Bioinformatics, 15
  • [14] Causal inference in the presence of missing data using a random forest-based matching algorithm
    Hillis, Tristan
    Guarcello, Maureen A.
    Levine, Richard A.
    Fan, Juanjuan
    STAT, 2021, 10 (01):
  • [15] An Improved Random Forest Based on Feature Selection and Feature Weighting for Case Retrieval in CBR Systems: Application to Medical Data
    Tarchoune, Ilhem
    Djebbar, Akila
    Merouani, Hayet Farida
    Hadji, Doha
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [16] Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
    Lakshmipadmaja, D.
    Vishnuvardhan, B.
    BIG DATA RESEARCH, 2018, 12 : 1 - 12
  • [17] Feature Selection Algorithm based on Random Forest applied to Sleep Apnea Detection
    Deyiaene, Margot
    Testelmans, Dries
    Borzee, Pascal
    Buyse, Bertien
    Van Huffel, Sabine
    Varon, Carolina
    2019 41ST ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2019, : 2580 - 2583
  • [18] Framework for efficient feature selection in genetic algorithm based data mining
    Sikora, Riyaz
    Piramuthu, Selwyn
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 180 (02) : 723 - 737
  • [19] A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment
    Arora, Nisha
    Kaur, Pankaj Deep
    APPLIED SOFT COMPUTING, 2020, 86 (86)
  • [20] A random forest-based algorithm for data-intensive spatial interpolation in crop yield mapping
    Mariano, Cordoba
    Monica, Balzarini
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2021, 184