A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data

被引:5
|
作者
Alromema, Nashwan [1 ]
Syed, Asif Hassan [1 ]
Khan, Tabrej [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Comp Sci, Jeddah 22254, Saudi Arabia
[2] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Informat Syst, Jeddah 22254, Saudi Arabia
关键词
primary breast tumor; gene-biomarkers; hybrid-feature selection approach; filter-based fs; two-tailed unpaired t-test; meta-heuristics techniques; supervised machine learning classifiers; breast tumor prediction; FEATURE-SELECTION ALGORITHM; CANCER; PROTEIN; MAPK; OPTIMIZATION; BIOMARKER; RISK; ENAH;
D O I
10.3390/diagnostics13040708
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naive Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 +/- 0.027, an F1-Score of 0.974 +/- 0.030, and an AUC value of 0.961 +/- 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] Gene expression data classification using topology and machine learning models
    Dey, Tamal K.
    Mandal, Sayan
    Mukherjee, Soham
    BMC BIOINFORMATICS, 2022, 22 (SUPPL 10)
  • [42] Hybrid Feature Selection Algorithm mRMR-ICA for Cancer Classification from Microarray Gene Expression Data
    Wang, Shuaiqun
    Kong, Wei
    Aorigele
    Deng, Jin
    Gao, Shangce
    Zeng, Weiming
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (06) : 420 - 430
  • [43] BINARY CLASSIFICATION OF CANCER MICROARRAY GENE EXPRESSION DATA USING EXTREME LEARNING MACHINES
    Arunkumar, C.
    Ramakrishnan, S.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 83 - 86
  • [44] Learning Bayesian classiriers from gene-expression microarray data
    Bosin, A
    Dessì, N
    Liberati, D
    Pes, B
    FUZZY LOGIC AND APPLICATIONS, 2006, 3849 : 297 - 304
  • [45] Spatial and geometric learning for classification of breast tumors from multi-center ultrasound images: a hybrid learning approach
    Ru, Jintao
    Zhu, Zili
    Shi, Jialin
    BMC MEDICAL IMAGING, 2024, 24 (01):
  • [46] Physically grounded approach for estimating gene expression from microarray data
    McMullen, Patrick D.
    Morimoto, Richard I.
    Amaral, Luis A. Nunes
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (31) : 13690 - 13695
  • [47] Active Learning Using Fuzzy k-NN for Cancer Classification from Microarray Gene Expression Data
    Halder, Anindya
    Dey, Samrat
    Kumar, Ansuman
    ADVANCES IN COMMUNICATION AND COMPUTING, 2015, 347 : 103 - 113
  • [48] SVM-ABC based cancer microarray (gene expression) hybrid method for data classification
    Gulande, Punam
    Awale, R. N.
    COMPUTATIONAL INTELLIGENCE, 2023, 39 (06) : 1054 - 1072
  • [49] A Machine Learning Approach to Crater Classification from Topographic Data
    Liu, Qiangyi
    Cheng, Weiming
    Yan, Guangjian
    Zhao, Yunliang
    Liu, Jianzhong
    REMOTE SENSING, 2019, 11 (21)
  • [50] A Hybrid Multiple Indefinite Kernel Learning Framework for Disease Classification from Gene Expression Data
    Swetha, S.
    Srinivasan, G. N.
    Dayananda, P.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 844 - 855