UniBFS: A novel uniform-solution-driven binary feature selection algorithm for high-dimensional data

被引:1
|
作者
Ahadzadeh, Behrouz [1 ]
Abdar, Moloud [2 ]
Foroumandi, Mahdieh [3 ]
Safara, Fatemeh [4 ]
Khosravi, Abbas [2 ]
Garcia, Salvador [5 ]
Suganthan, Ponnuthurai Nagaratnam [6 ]
机构
[1] Islamic Azad Univ, Dept Elect Comp & IT Engn, Qazvin Branch, Qazvin, Iran
[2] Deakin Univ, Inst Intelligent Syst Res & Innovat IISRI, Geelong, Australia
[3] Univ Tehran, Sch Elect & Comp Engn, Tehran, Iran
[4] Islamic Azad Univ, Dept Comp Engn, Islamshahr Branch, Islamshahr, Iran
[5] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[6] Qatar Univ, Coll Engn, KINDI Ctr Comp Res, Doha, Qatar
关键词
High-dimensional data classification; Evolutionary algorithms for selection; Swarm algorithms for selection; Binary feature selection algorithm;
D O I
10.1016/j.swevo.2024.101715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) is a crucial technique in machine learning and data mining, serving a variety of purposes such as simplifying model construction, facilitating knowledge discovery, improving computational efficiency, and reducing memory consumption. Despite its importance, the constantly increasing search space of highdimensional datasets poses significant challenges to FS methods, including issues like the "curse of dimensionality," susceptibility to local optima, and high computational and memory costs. To overcome these challenges, a new FS algorithm named Uniform-solution-driven Binary Feature Selection (UniBFS) has been developed in this study. UniBFS exploits the inherent characteristic of binary algorithms-binary coding-to search the entire problem space for identifying relevant features while avoiding irrelevant ones. To improve the effectiveness and efficiency of the UniBFS algorithm, Redundant Features Elimination algorithm (RFE) is presented in this paper. RFE performs a local search in a very small subspace of the solutions obtained by UniBFS in different stages, and removes the redundant features which do not increase the classification accuracy. Moreover, the study proposes a hybrid algorithm that combines UniBFS with two filter-based FS methods, ReliefF and Fisher, to identify pertinent features during the global search phase. The proposed algorithms are evaluated on 30 high-dimensional datasets ranging from 2000 to 54676 dimensions, and their effectiveness and efficiency are compared with stateof-the-art techniques, demonstrating their superiority.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] FACO: A Novel Hybrid Feature Selection Algorithm for High-Dimensional Data Classification
    Popoola, Gideon
    Oyeniran, Kayode
    SOUTHEASTCON 2024, 2024, : 61 - 68
  • [2] BOSO: A novel feature selection algorithm for linear regression with high-dimensional data
    Valcarcel, Luis J.
    San Jose-Eneriz, Edurne L.
    Cendoya, Xabier
    Rubio, Angel L.
    Agirre, Xabier
    Prosper, Felipe L.
    Planes, Francisco
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (05)
  • [3] Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data
    Elnaz Pashaei
    Elham Pashaei
    Neural Computing and Applications, 2023, 35 : 353 - 374
  • [4] Fractional-order binary bat algorithm for feature selection on high-dimensional microarray data
    Esfandiari A.
    Farivar F.
    Khaloozadeh H.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7453 - 7467
  • [5] Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data
    Pashaei, Elnaz
    Pashaei, Elham
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (01): : 353 - 374
  • [6] A Novel Feature Selection Method for High-Dimensional Biomedical Data Based on an Improved Binary Clonal Flower Pollination Algorithm
    Yan, Chaokun
    Ma, Jingjing
    Luo, Huimin
    Zhang, Ge
    Luo, Junwei
    HUMAN HEREDITY, 2019, 84 (01) : 34 - 46
  • [7] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [8] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40
  • [9] Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data
    Pashaei, Elham
    Pashaei, Elnaz
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (13): : 15598 - 15637
  • [10] Hybrid binary arithmetic optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical data
    Elham Pashaei
    Elnaz Pashaei
    The Journal of Supercomputing, 2022, 78 : 15598 - 15637