A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data

被引:5
|
作者
Alromema, Nashwan [1 ]
Syed, Asif Hassan [1 ]
Khan, Tabrej [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Comp Sci, Jeddah 22254, Saudi Arabia
[2] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Informat Syst, Jeddah 22254, Saudi Arabia
关键词
primary breast tumor; gene-biomarkers; hybrid-feature selection approach; filter-based fs; two-tailed unpaired t-test; meta-heuristics techniques; supervised machine learning classifiers; breast tumor prediction; FEATURE-SELECTION ALGORITHM; CANCER; PROTEIN; MAPK; OPTIMIZATION; BIOMARKER; RISK; ENAH;
D O I
10.3390/diagnostics13040708
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naive Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 +/- 0.027, an F1-Score of 0.974 +/- 0.030, and an AUC value of 0.961 +/- 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization
    Sharbaf, Fatemeh Vafaee
    Mosafer, Sara
    Moattar, Mohammad Hossein
    GENOMICS, 2016, 107 (06) : 231 - 238
  • [22] A comparative study of different machine learning methods on microarray gene expression data
    Mehdi Pirooznia
    Jack Y Yang
    Mary Qu Yang
    Youping Deng
    BMC Genomics, 9
  • [23] A comparative study of different machine learning methods on microarray gene expression data
    Pirooznia, Mehdi
    Yang, Jack Y.
    Yang, Mary Qu
    Deng, Youping
    BMC GENOMICS, 2008, 9 (Suppl 1)
  • [24] Deep learning techniques for cancer classification using microarray gene expression data
    Gupta, Surbhi
    Gupta, Manoj K.
    Shabaz, Mohammad
    Sharma, Ashutosh
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [25] Deep Learning Enabled Microarray Gene Expression Classification for Data Science Applications
    Malibari, Areej A.
    Alshehri, Reem M.
    Al-Wesabi, Fahd N.
    Negm, Noha
    Al Duhayyim, Mesfer
    Hilal, Anwer Mustafa
    Yaseen, Ishfaq
    Motwakel, Abdelwahed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (02): : 4277 - 4290
  • [26] Cancer Classification Based on Microarray Gene Expression Data Using Deep Learning
    Guillen, Pablo
    Ebalunode, Jerry
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1403 - 1405
  • [27] Multicategory classification using an Extreme Learning Machine for Microarray gene expression cancer diagnosis
    Zhang, Runxuan
    Huang, Guang-Bin
    Sundararajan, Narasimhan
    Saratchandran, P.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (03) : 485 - 495
  • [28] Gene selection and classification for cancer microarray data based on machine learning and similarity measures
    Liu, Qingzhong
    Sung, Andrew H.
    Chen, Zhongxue
    Liu, Jianzhong
    Chen, Lei
    Qiao, Mengyu
    Wang, Zhaohui
    Huang, Xudong
    Deng, Youping
    BMC GENOMICS, 2011, 12
  • [29] Gene selection and classification for cancer microarray data based on machine learning and similarity measures
    Qingzhong Liu
    Andrew H Sung
    Zhongxue Chen
    Jianzhong Liu
    Lei Chen
    Mengyu Qiao
    Zhaohui Wang
    Xudong Huang
    Youping Deng
    BMC Genomics, 12
  • [30] Classifier fusion to predict breast cancer tumors based on microarray gene expression data
    Raza, M
    Gondal, I
    Green, D
    Coppel, RL
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 2005, 3684 : 866 - 874