Feature selection and computational optimization in high-dimensional microarray cancer datasets via InfoGain-modified bat algorithm

被引:4
|
作者
Hambali, Moshood A. [1 ]
Oladele, Tinuke O. [2 ]
Adewole, Kayode S. [2 ]
Sangaiah, Arun Kumar [3 ]
Gao, Wei [4 ]
机构
[1] Fed Univ Wukari, Dept Comp Sci, Wukari, Taraba State, Nigeria
[2] Univ Ilorin, Dept Comp Sci, Ilorin, Kwara State, Nigeria
[3] Natl Yunlin Univ Sci & Technol, Touliu 633102, Yunlin, Taiwan
[4] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Feature selection; Binary bat algorithm; Information gain; Cancer classification; Microarray data; Random forest; Computational optimization; ANT COLONY OPTIMIZATION; PARTIAL LEAST-SQUARES; GENE SELECTION; TUMOR CLASSIFICATION; MARKOV BLANKET; EXPRESSION; ECHOLOCATION; PREDICTION; SEARCH;
D O I
10.1007/s11042-022-13532-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Achieving a satisfactory cancer classification accuracy with the complete set of genes remains a great challenge, due to the high dimensions, small sample size, and presence of noise in gene expression data. Feature reduction is critical and sensitive in the classification task, most importantly in heterogeneous multimedia data. One of the major drawbacks in cancer study is recognizing informative genes from thousands of available genes in microarray data. Traditional feature selection algorithms have failed to scale on large space data like microarray data. Therefore, an effective feature selection algorithm is required to explore the most significant subset of genes by removing non-predictive genes from the dataset without compromising the accuracy of the classification algorithm. The study proposed an information Gain - Modified Bat Algorithm (InfoGain-MBA) features selection model for selecting relevant and informative features from high dimensional Microarray cancer datasets and evaluate the approach with four classifiers - C4.5, Decision Tree, Random Forest and classification and regression tree (CART). The results obtained show that the proposed approach is promising for the classification of microarray cancer data. The random forest has 100% accuracy with few genes in all seven datasets used. Further investigations were also conducted to determine the optimal threshold for each of the datasets.
引用
收藏
页码:36505 / 36549
页数:45
相关论文
共 50 条
  • [41] A particle swarm optimization based multiobjective memetic algorithm for high-dimensional feature selection
    Luo, Juanjuan
    Zhou, Dongqing
    Jiang, Lingling
    Ma, Huadong
    MEMETIC COMPUTING, 2022, 14 (01) : 77 - 93
  • [42] Extremely High-Dimensional Feature Selection via Feature Generating Samplings
    Li, Shutao
    Wei, Dan
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (06) : 737 - 747
  • [43] Markov Blanket: Efficient Strategy For Feature Subset Selection Method For High Dimensional Microarray Cancer Datasets
    Passi, Kalpdrum
    Nour, Abdala
    Jain, Chakresh Kumar
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1864 - 1871
  • [44] Evolutionary Multitasking for Feature Selection in High-Dimensional Classification via Particle Swarm Optimization
    Chen, Ke
    Xue, Bing
    Zhang, Mengjie
    Zhou, Fengyu
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (03) : 446 - 460
  • [45] Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data
    Abd El-Mageed, Amr A.
    Elkhouli, Ahmed E.
    Abohany, Amr A.
    Gafar, Mona
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [46] Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data
    Amr A. Abd El-Mageed
    Ahmed E. Elkhouli
    Amr A. Abohany
    Mona Gafar
    Journal of Big Data, 11
  • [47] Analytical and Experimental Study of Filter Feature Selection Algorithms for High-dimensional Datasets
    Pino, Adrian
    Morell, Carlos
    PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON KNOWLEDGE DISCOVERY, KNOWLEDGE MANAGEMENT AND DECISION SUPPORT (EUREKA-2013), 2013, 51 : 339 - 349
  • [48] A GA-BASED FEATURE SELECTION AND ENSEMBLE LEARNING FOR HIGH-DIMENSIONAL DATASETS
    Xia, Pei-Yong
    Ding, Xiang-Qian
    Jiang, Bai-Ning
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 7 - +
  • [49] A New Evolutionary Multitasking Algorithm for High-Dimensional Feature Selection
    Liu, Ping
    Xu, Bangxin
    Xu, Wenwen
    IEEE ACCESS, 2024, 12 : 89856 - 89872
  • [50] A hybrid Artificial Immune optimization for high-dimensional feature selection
    Zhu, Yongbin
    Li, Wenshan
    Li, Tao
    KNOWLEDGE-BASED SYSTEMS, 2023, 260