Genome-wide association studies of ischemic stroke based on interpretable machine learning

被引:0
|
作者
Nikoli, Stefan [1 ]
Ignatov, Dmitry I. [1 ]
Khvorykh, Gennady, V [2 ]
Limborska, Svetlana A. [2 ]
Khrunin, Andrey, V [2 ]
机构
[1] HSE Univ, Lab Models & Methods Computat Pragmat, Dept Data Anal & Artificial Intelligence, Moscow, Russia
[2] Natl Res Ctr Kurchatov Inst, Moscow, Russia
基金
俄罗斯科学基金会;
关键词
Genome-wide association studies; Interpretable machine learning; Ischemic stroke; Illuminating druggable genome; XGBoost; Interpretable neural network TabNet; SNP ranking; SNP importance; OXIDATIVE STRESS; DISEASE; RISK; GENE; PROTEINS; LOCI;
D O I
10.7717/peerj-cs.2454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the identification of several dozen genetic loci associated with ischemic stroke (IS), the genetic bases of this disease remain largely unexplored. In this research we present the results of genome-wide association studies (GWAS) based on classical statistical testing and machine learning algorithms (logistic regression, gradient boosting on decision trees, and tabular deep learning model TabNet). To build a consensus on the results obtained by different techniques, the Pareto-Optimal solution was proposed and applied. These methods were applied to real genotypic data of sick and healthy individuals of European ancestry obtained from the Database of Genotypes and Phenotypes (5,581 individuals, 883,749 single nucleotide polymorphisms). Finally, 131 genes were identified as candidates for association with the onset of IS. UBQLN1, TRPS1, and MUSK were previously described as associated with the course of IS in model animals. ACOT11 taking part in metabolism of fatty acids was shown for the first time to be associated with IS. The identified genes were compared with genes from the Illuminating Druggable Genome project. The product of GPR26 representing the G-coupled protein receptor can be considered as a therapeutic target for stroke prevention. The approaches presented in this research can be used to reprocess GWAS datasets from other diseases.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Machine Learning in Genome-Wide Association Studies
    Szymczak, Silke
    Biernacka, Joanna M.
    Cordell, Heather J.
    Gonzalez-Recio, Oscar
    Koenig, Inke R.
    Zhang, Heping
    Sun, Yan V.
    GENETIC EPIDEMIOLOGY, 2009, 33 : S51 - S57
  • [2] Machine learning approaches to genome-wide association studies
    Enoma, David O.
    Bishung, Janet
    Abiodun, Theresa
    Ogunlana, Olubanke
    Osamor, Victor Chukwudi
    JOURNAL OF KING SAUD UNIVERSITY SCIENCE, 2022, 34 (04)
  • [3] Editorial: Machine Learning in Genome-Wide Association Studies
    Hu, Ting
    Darabos, Christian
    Urbanowicz, Ryan
    FRONTIERS IN GENETICS, 2020, 11
  • [4] Leveraging machine learning to advance genome-wide association studies
    Dagasso, Gabrielle
    Yan, Yan
    Wang, Lipu
    Li, Longhai
    Kutcher, Randy
    Zhang, Wentao
    Jin, Lingling
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2021, 25 (1-2) : 17 - 36
  • [5] Machine Learning to Advance Human Genome-Wide Association Studies
    Sigala, Rafaella E.
    Lagou, Vasiliki
    Shmeliov, Aleksey
    Atito, Sara
    Kouchaki, Samaneh
    Awais, Muhammad
    Prokopenko, Inga
    Mahdi, Adam
    Demirkan, Ayse
    GENES, 2024, 15 (01)
  • [6] The search for genetic risk factors of ischemic stroke with the genome-wide association study and machine learning methods
    Khvorykh, Gennady
    Rogacheva, Margarita
    Sharypov, Ruslan
    Khrunin, Andrey
    Dyakonov, Aleksandr
    Limborska, Svetlana
    Fedorov, Alexei
    BMC BIOINFORMATICS, 2020, 21 (SUPPL 20):
  • [7] Genome-Wide Association Studies of 3 Distinct Recovery Phenotypes in Mild Ischemic Stroke
    Aldridge, Chad M.
    Braun, Robynne
    Lohse, Keith
    de Havenon, Adam
    Cole, John W.
    Cramer, Steven C.
    Lindgren, Arne G.
    Keene, Keith L.
    Hsu, Fang-Chi
    Worrall, Bradford B.
    NEUROLOGY, 2024, 102 (03) : e208011
  • [8] Genome-Wide Association Analysis of Ischemic Stroke in Young Adults
    Cheng, Yu-Ching
    O'Connell, Jeffrey R.
    Cole, John W.
    Stine, O. Colin
    Dueker, Nicole
    McArdle, Patrick F.
    Sparks, Mary J.
    Shen, Jess
    Laurie, Cathy C.
    Nelson, Sarah
    Doheny, Kimberly F.
    Ling, Hua
    Pugh, Elizabeth W.
    Brott, Thomas G.
    Brown, Robert D., Jr.
    Meschia, James F.
    Nalls, Michael
    Rich, Stephen S.
    Worrall, Bradford
    Anderson, Christopher D.
    Biffi, Alessandro
    Cortellini, Lynelle
    Furie, Karen L.
    Rost, Natalia S.
    Rosand, Jonathan
    Manolio, Teri A.
    Kittner, Steven J.
    Mitchell, Braxton D.
    G3-GENES GENOMES GENETICS, 2011, 1 (06): : 505 - 513
  • [9] Wellcome Trust Genome-Wide Association Study of Ischemic Stroke
    Markus, Hugh S.
    STROKE, 2013, 44 (06) : S20 - S22
  • [10] Revisiting genome-wide association studies from statistical modelling to machine learning
    Sun, Shanwen
    Dong, Benzhi
    Zou, Quan
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)