A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data

被引:5
|
作者
Alromema, Nashwan [1 ]
Syed, Asif Hassan [1 ]
Khan, Tabrej [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Comp Sci, Jeddah 22254, Saudi Arabia
[2] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Informat Syst, Jeddah 22254, Saudi Arabia
关键词
primary breast tumor; gene-biomarkers; hybrid-feature selection approach; filter-based fs; two-tailed unpaired t-test; meta-heuristics techniques; supervised machine learning classifiers; breast tumor prediction; FEATURE-SELECTION ALGORITHM; CANCER; PROTEIN; MAPK; OPTIMIZATION; BIOMARKER; RISK; ENAH;
D O I
10.3390/diagnostics13040708
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naive Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 +/- 0.027, an F1-Score of 0.974 +/- 0.030, and an AUC value of 0.961 +/- 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Gene selection from microarray data for cancer classification - a machine learning approach
    Wang, Y
    Tetko, IV
    Hall, MA
    Frank, E
    Facius, A
    Mayer, KFX
    Mewes, HW
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (01) : 37 - 46
  • [2] Gene ranking from microarray data for cancer classification -: A machine learning approach
    Ruiz, Roberto
    Pontes, Beatriz
    Giraldez, Raul
    Aguilar-Ruiz, Jesus S.
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2006, 4252 : 1272 - 1280
  • [3] A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification
    Peng, Yanxiong
    Li, Wenyuan
    Liu, Ying
    CANCER INFORMATICS, 2006, 2 : 301 - 311
  • [4] An efficient approach for classification of gene expression microarray data
    Sreepada, Rama Syamala
    Vipsita, Swati
    Mohapatra, Puspanjali
    2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 344 - 348
  • [5] Learning microarray gene expression data by hybrid discriminant analysis
    Lu, Yijuan
    Tian, Qi
    Sanchez, Maribel
    Neary, Jennifer
    Liu, Feng
    Wang, Yufeng
    IEEE MULTIMEDIA, 2007, 14 (04) : 22 - 31
  • [6] Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review
    Osama, Sarah
    Shaban, Hassan
    Ali, Abdelmgeid A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [7] A hybrid feature selection approach for microarray gene expression data
    Tan, Feng
    Fu, Xuezheng
    Wang, Hao
    Zhang, Yanqing
    Bourgeois, Anu
    COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 678 - 685
  • [8] Tumor classification based on gene microarray data and hybrid learning method
    Liu, J
    Zhou, HB
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2275 - 2280
  • [9] Unsupervised Machine Learning Approach for Gene Expression Microarray Data Using Soft Computing Technique
    Rana, Madhurima
    Vijayeeta, Prachi
    Kar, Utsav
    Das, Madhabananda
    Mishra, B. S. P.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 497 - 506
  • [10] Classification of breast cancer using microarray gene expression data: A survey
    Abd-Elnaby, Muhammed
    Alfonse, Marco
    Roushdy, Mohamed
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 117