Feature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm

被引:4
|
作者
Hambali, Moshood A. [1 ]
Oladele, Tinuke O. [2 ]
Adewole, Kayode S. [2 ]
Sangaiah, Arun Kumar [3 ]
Gao, Wei [4 ]
机构
[1] Fed Univ Wukari, Dept Comp Sci, Wukari, Taraba State, Nigeria
[2] Univ Ilorin, Dept Comp Sci, Ilorin, Kwara State, Nigeria
[3] Natl Yunlin Univ Sci & Technol, Touliu 633102, Yunlin, Taiwan
[4] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Feature selection; Binary bat algorithm; Information gain; Cancer classification; Microarray data; Random forest; Computational optimization; ANT COLONY OPTIMIZATION; PARTIAL LEAST-SQUARES; GENE SELECTION; TUMOR CLASSIFICATION; MARKOV BLANKET; EXPRESSION; ECHOLOCATION; PREDICTION; SEARCH;
D O I
10.1007/s11042-022-13532-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving a satisfactory cancer classification accuracy with the complete set of genes remains a great challenge, due to the high dimensions, small sample size, and presence of noise in gene expression data. Feature reduction is critical and sensitive in the classification task, most importantly in heterogeneous multimedia data. One of the major drawbacks in cancer study is recognizing informative genes from thousands of available genes in microarray data. Traditional feature selection algorithms have failed to scale on large space data like microarray data. Therefore, an effective feature selection algorithm is required to explore the most significant subset of genes by removing non-predictive genes from the dataset without compromising the accuracy of the classification algorithm. The study proposed an information Gain - Modified Bat Algorithm (InfoGain-MBA) features selection model for selecting relevant and informative features from high dimensional Microarray cancer datasets and evaluate the approach with four classifiers - C4.5, Decision Tree, Random Forest and classification and regression tree (CART). The results obtained show that the proposed approach is promising for the classification of microarray cancer data. The random forest has 100% accuracy with few genes in all seven datasets used. Further investigations were also conducted to determine the optimal threshold for each of the datasets.
引用
收藏
页码:36505 / 36549
页数:45
相关论文
共 50 条
  • [1] Feature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm
    Moshood A. Hambali
    Tinuke O. Oladele
    Kayode S. Adewole
    Arun Kumar Sangaiah
    Wei Gao
    Multimedia Tools and Applications, 2022, 81 : 36505 - 36549
  • [2] A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets
    Sayed, Sabah
    Nassef, Mohammad
    Badr, Amr
    Farag, Ibrahim
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 233 - 243
  • [3] Feature selection in high-dimensional microarray cancer datasets using an improved equilibrium optimization approach
    Balakrishnan, Kulanthaivel
    Dhanalakshmi, Ramasamy
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (28):
  • [4] Fractional-order binary bat algorithm for feature selection on high-dimensional microarray data
    Esfandiari A.
    Farivar F.
    Khaloozadeh H.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (06) : 7453 - 7467
  • [5] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [6] Distributed feature selection: A hesitant fuzzy correlation concept for microarray high-dimensional datasets
    Ebrahimpour, Mohammad Kazem
    Eftekhari, Mahdi
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 173 : 51 - 64
  • [7] Multitasking Feature Selection Using a Clonal Selection Algorithm for High-Dimensional Microarray Data
    Wang, Yi
    Luo, Dan
    Yao, Jian
    ELECTRONICS, 2024, 13 (23):
  • [8] Evolutionary binary feature selection using adaptive ebola optimization search algorithm for high-dimensional datasets
    Oyelade, Olaide N. N.
    Agushaka, Jeffrey O. O.
    Ezugwu, Absalom E. E.
    PLOS ONE, 2023, 18 (03):
  • [9] A Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets
    Firouznia, Marjan
    Trunfio, Giuseppe A.
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 211 - 218
  • [10] Improved PSO for feature selection on high-dimensional datasets
    Tran, Binh (binh.tran@ecs.vuw.ac.nz), 1600, Springer Verlag (8886):