Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer

被引:11
|
作者
Bostanci, Erkan [1 ]
Kocak, Engin [2 ]
Unal, Metehan [1 ]
Guzel, Mehmet Serdar [1 ]
Acici, Koray [3 ]
Asuroglu, Tunc [4 ]
机构
[1] Ankara Univ, Fac Engn, Dept Comp Engn, TR-06830 Ankara, Turkiye
[2] Univ Hlth Sci, Fac Gulhane Pharm, Dept Analyt Chem, TR-06018 Ankara, Turkiye
[3] Ankara Univ, Fac Engn, Dept Artificial Intelligence & Data Engn, TR-06830 Ankara, Turkiye
[4] Tampere Univ, Fac Med & Hlth Technol, Tampere 33720, Finland
关键词
transcriptomics; RNA-seq; machine learning; deep learning; classification; cancer prediction; exRNA; CLASSIFICATION; AGREEMENT; HEALTH;
D O I
10.3390/s23063080
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Prediction of Genetic Biomarkers from RNA-Seq Dataset of Colon Cancer
    Adeyemi, Tijesunimi
    Ezekiel, Deborah
    Diaz, Sergio
    Sabb, Felix
    Abdul, Abdullah
    Nembhard, Fitzroy
    Paudel, Roshan
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1378 - 1385
  • [2] Building Machine Learning Models on Limited Transcriptomic RNA-seq Data
    Jin Shuai
    Li Yaoyu
    Peng Jiawu
    2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 358 - 363
  • [3] Interrogation of small RNA-seq data for small noncoding RNA in human colon cancer
    Koduru, Srinivas V.
    Nyinawabera, Angelique
    Ravnic, Dino J.
    Tiwari, Amit K.
    CANCER RESEARCH, 2017, 77
  • [4] Bioinformatic analysis of RNA-Seq data to search for novel diagnostic/prognostic biomarkers of pancreatic ductal adenocarcinoma
    Sosa, Omar J.
    Paixao, Vinicius F.
    Setubal, Joao C.
    Reis, Eduardo M.
    CANCER RESEARCH, 2016, 76
  • [5] Identifying Diagnostic and Prognostic Differentially Expressed Genes of Gastric Cancer Based on RNA-seq Bioinformatics Analysis
    Wang, Minjuan
    Jiang, Xing
    Xu, Shiqi
    Deng, Yun
    Cao, Tian
    Cheng, Yao
    Zhang, Wen-Han
    Zhang, Lan
    Hu, Jiankun
    GENETIC TESTING AND MOLECULAR BIOMARKERS, 2022, 26 (11) : 512 - 521
  • [6] Integration of Multimodal RNA-Seq Data for Prediction of Kidney Cancer Survival
    Schwartz, Matt
    Park, Martin
    Phan, John H.
    Wang, May D.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1591 - 1595
  • [7] Single-cell RNA-Seq and bulk RNA-Seq reveal reliable diagnostic and prognostic biomarkers for CRC
    Zhang, Xing
    Yang, Longkun
    Deng, Ying
    Huang, Zhicong
    Huang, Hao
    Wu, Yuying
    He, Baochang
    Hu, Fulan
    JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY, 2023, 149 (12) : 9805 - 9821
  • [8] Single-cell RNA-Seq and bulk RNA-Seq reveal reliable diagnostic and prognostic biomarkers for CRC
    Xing Zhang
    Longkun Yang
    Ying Deng
    Zhicong Huang
    Hao Huang
    Yuying Wu
    Baochang He
    Fulan Hu
    Journal of Cancer Research and Clinical Oncology, 2023, 149 : 9805 - 9821
  • [9] Analysis of clustered RNA-seq data
    Park, Hyunjin
    Lee, Seungyeoun
    Kim, Ye Jin
    Choi, Myung-Sook
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 19 (01) : 19 - 31
  • [10] Analysis of RNA-Seq data using self-supervised learning for vital status prediction of colorectal cancer patients
    Padegal, Girivinay
    Rao, Murali Krishna
    Ravishankar, Om Amitesh Boggaram
    Acharya, Sathwik
    Athri, Prashanth
    Srinivasa, Gowri
    BMC BIOINFORMATICS, 2023, 24 (01)