Feature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm

被引:4
|
作者
Hambali, Moshood A. [1 ]
Oladele, Tinuke O. [2 ]
Adewole, Kayode S. [2 ]
Sangaiah, Arun Kumar [3 ]
Gao, Wei [4 ]
机构
[1] Fed Univ Wukari, Dept Comp Sci, Wukari, Taraba State, Nigeria
[2] Univ Ilorin, Dept Comp Sci, Ilorin, Kwara State, Nigeria
[3] Natl Yunlin Univ Sci & Technol, Touliu 633102, Yunlin, Taiwan
[4] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Feature selection; Binary bat algorithm; Information gain; Cancer classification; Microarray data; Random forest; Computational optimization; ANT COLONY OPTIMIZATION; PARTIAL LEAST-SQUARES; GENE SELECTION; TUMOR CLASSIFICATION; MARKOV BLANKET; EXPRESSION; ECHOLOCATION; PREDICTION; SEARCH;
D O I
10.1007/s11042-022-13532-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving a satisfactory cancer classification accuracy with the complete set of genes remains a great challenge, due to the high dimensions, small sample size, and presence of noise in gene expression data. Feature reduction is critical and sensitive in the classification task, most importantly in heterogeneous multimedia data. One of the major drawbacks in cancer study is recognizing informative genes from thousands of available genes in microarray data. Traditional feature selection algorithms have failed to scale on large space data like microarray data. Therefore, an effective feature selection algorithm is required to explore the most significant subset of genes by removing non-predictive genes from the dataset without compromising the accuracy of the classification algorithm. The study proposed an information Gain - Modified Bat Algorithm (InfoGain-MBA) features selection model for selecting relevant and informative features from high dimensional Microarray cancer datasets and evaluate the approach with four classifiers - C4.5, Decision Tree, Random Forest and classification and regression tree (CART). The results obtained show that the proposed approach is promising for the classification of microarray cancer data. The random forest has 100% accuracy with few genes in all seven datasets used. Further investigations were also conducted to determine the optimal threshold for each of the datasets.
引用
收藏
页码:36505 / 36549
页数:45
相关论文
共 50 条
  • [31] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [32] Architectural optimization and feature learning for high-dimensional time series datasets
    Colgan, Robert E.
    Yan, Jingkai
    Marka, Zsuzsa
    Bartos, Imre
    Marka, Szabolcs
    Wright, John N.
    PHYSICAL REVIEW D, 2023, 107 (02)
  • [33] A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets
    Bermejo, Pablo
    Gamez, Jose A.
    Puerta, Jose M.
    PATTERN RECOGNITION LETTERS, 2011, 32 (05) : 701 - 711
  • [34] Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
    Chaudhry, Muhammad Umar
    Yasir, Muhammad
    Asghar, Muhammad Nabeel
    Lee, Jee-Hyong
    ENTROPY, 2020, 22 (10) : 1 - 15
  • [35] Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets
    Hossein Nematzadeh
    Joseph Mani
    Zahra Nematzadeh
    Ebrahim Akbari
    Radziah Mohamad
    Neural Computing and Applications, 2025, 37 (8) : 6217 - 6232
  • [36] An Asymmetric Chaotic Competitive Swarm Optimization Algorithm for Feature Selection in High-Dimensional Data
    Pichai, Supailin
    Sunat, Khamron
    Chiewchanwattana, Sirapat
    SYMMETRY-BASEL, 2020, 12 (11): : 1 - 13
  • [37] Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach
    Jingwei Too
    Majdi Mafarja
    Seyedali Mirjalili
    Neural Computing and Applications, 2021, 33 : 16229 - 16250
  • [38] Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach
    Too, Jingwei
    Mafarja, Majdi
    Mirjalili, Seyedali
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16229 - 16250
  • [39] An adaptively balanced grey wolf optimization algorithm for feature selection on high-dimensional classification
    Wang, Jing
    Lin, Dakun
    Zhang, Yuanzi
    Huang, Shiguo
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [40] A particle swarm optimization based multiobjective memetic algorithm for high-dimensional feature selection
    Juanjuan Luo
    Dongqing Zhou
    Lingling Jiang
    Huadong Ma
    Memetic Computing, 2022, 14 : 77 - 93