Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers

被引:5
|
作者
Gomes, Rahul [1 ]
Paul, Nijhum [2 ]
He, Nichol [1 ]
Huber, Aaron Francis [1 ]
Jansen, Rick J. [2 ,3 ,4 ,5 ]
机构
[1] Univ Wisconsin, Dept Comp Sci, 133 Phillips Sci Hall,101 Roosevelt Ave, Eau Claire, WI 54701 USA
[2] North Dakota State Univ, Dept Publ Hlth, 640S Aldevron Tower,1455 14th Ave N, Fargo, ND 58102 USA
[3] North Dakota State Univ, Genom Phen & Bioinformat Program, 640S Aldevron Tower,1455 14th Ave N, Fargo, ND 58102 USA
[4] North Dakota State Univ, Ctr Immunizat Res & Educ CIRE, 640S Aldevron Tower,1455 14th Ave N, Fargo, ND 58102 USA
[5] North Dakota State Univ, Ctr Diagnost & Therapeut Strategies Pancreat Canc, 640S Aldevron Tower,1455 14th Ave N, Fargo, ND 58102 USA
基金
美国国家科学基金会;
关键词
DNA methylation; deep learning; breast cancer; TCGA;
D O I
10.3390/genes13091557
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
DNA methylation is a process that can affect gene accessibility and therefore gene expression. In this study, a machine learning pipeline is proposed for the prediction of breast cancer and the identification of significant genes that contribute to the prediction. The current study utilized breast cancer methylation data from The Cancer Genome Atlas (TCGA), specifically the TCGA-BRCA dataset. Feature engineering techniques have been utilized to reduce data volume and make deep learning scalable. A comparative analysis of the proposed approach on Illumina 27K and 450K methylation data reveals that deep learning methodologies for cancer prediction can be coupled with feature selection models to enhance prediction accuracy. Prediction using 450K methylation markers can be accomplished in less than 13 s with an accuracy of 98.75%. Of the list of 685 genes in the feature selected 27K dataset, 578 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in five biological processes and one molecular function. Of the list of 1572 genes in the feature selected 450K data set, 1290 were mapped to Ensemble Gene IDs. This reduced set was significantly (FDR < 0.05) enriched in 95 biological processes and 17 molecular functions. Seven oncogene/tumor suppressor genes were common between the 27K and 450K feature selected gene sets. These genes were RTN4IP1, MYO18B, ANP32A, BRF1, SETBP1, NTRK1, and IGF2R. Our bioinformatics deep learning workflow, incorporating imputation and data balancing methods, is able to identify important methylation markers related to functionally important genes in breast cancer with high accuracy compared to deep learning or statistical models alone.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] DNA Methylation Markers for the Specific and Universal Detection of Colorectal Neoplasia: Selection by Deep Sequencing
    Taylor, William R.
    Middha, Sumit
    Eckloff, Bruce W.
    Yab, Tracy C.
    Mahoney, Douglas W.
    Boardman, Lisa
    Lidgard, Graham P.
    Ahlquist, David A.
    GASTROENTEROLOGY, 2012, 142 (05) : S527 - S527
  • [32] Class-Incremental Learning With Deep Generative Feature Replay for DNA Methylation-Based Cancer Classification
    Batbaatar, Erdenebileg
    Park, Kwang Ho
    Amarbayasgalan, Tsatsral
    Davagdorj, Khishigsuren
    Munkhdalai, Lkhagvadorj
    Pham, Van-Huy
    Ryu, Keun Ho
    IEEE ACCESS, 2020, 8 (08): : 210800 - 210815
  • [33] Radiomics in Deep Learning - Feature Augmentation for Lung Cancer Prediction
    Masquelin, A. H.
    Whitney, D.
    Stevenson, C.
    Spira, A.
    Bates, J. H.
    Estepar, R. San Jose
    Kinsey, C.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2020, 201
  • [34] Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists
    Khan, Muhammad Attique
    Ashraf, Imran
    Alhaisoni, Majed
    Damasevicius, Robertas
    Scherer, Rafal
    Rehman, Amjad
    Bukhari, Syed Ahmad Chan
    DIAGNOSTICS, 2020, 10 (08)
  • [35] Algorithm Selection Using Deep Learning Without Feature Extraction
    Alissa, Mohamad
    Sim, Kevin
    Hart, Emma
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 198 - 206
  • [36] DNA methylation markers in lung cancer
    Zoechbauer-Mueller, Sabine
    JOURNAL OF THORACIC ONCOLOGY, 2009, 4 (09) : S204 - S205
  • [37] DNA methylation markers for ovarian cancer
    Moffitt, M.
    Campan, M.
    Houshdaran, S.
    Widschwendter, M.
    Daxenbichler, G.
    Marth, C.
    Roman, L.
    Laird, P.
    GYNECOLOGIC ONCOLOGY, 2010, 116 (03) : S153 - S153
  • [38] DNA methylation markers in colorectal cancer
    Myoung Sook Kim
    Juna Lee
    David Sidransky
    Cancer and Metastasis Reviews, 2010, 29 : 181 - 206
  • [39] DNA methylation markers in colorectal cancer
    Kim, Myoung Sook
    Lee, Juna
    Sidransky, David
    CANCER AND METASTASIS REVIEWS, 2010, 29 (01) : 181 - 206
  • [40] DNA methylation markers in esophageal cancer
    Xu, Yongle
    Wang, Zhenzhen
    Pei, Bing
    Wang, Jie
    Xue, Ying
    Zhao, Guodong
    FRONTIERS IN GENETICS, 2024, 15