Transcriptome prediction performance across machine learning models and diverse ancestries

被引:17
|
作者
Okoro, Paul C. [1 ]
Schubert, Ryan [2 ]
Guo, Xiuqing [3 ,4 ]
Johnson, W. Craig [5 ]
Rotter, Jerome, I [3 ,4 ]
Hoeschele, Ina [6 ,7 ,8 ]
Liu, Yongmei [9 ]
Im, Hae Kyung [10 ]
Luke, Amy [11 ]
Dugas, Lara R. [11 ,12 ]
Wheeler, Heather E. [1 ,13 ,14 ]
机构
[1] Loyola Univ Chicago, Program Bioinformat, Chicago, IL 60660 USA
[2] Loyola Univ Chicago, Dept Math & Stat, Chicago, IL USA
[3] Harbor UCLA Med Ctr, Inst Translat Genom & Populat Sci, Lundquist Inst, Torrance, CA 90509 USA
[4] Harbor UCLA Med Ctr, Dept Pediat, Torrance, CA 90509 USA
[5] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[6] Virginia Tech, Fralin Life Sci Inst, Blacksburg, VA USA
[7] Virginia Tech, Dept Stat, Blacksburg, VA USA
[8] Wake Forest Sch Med, Winston Salem, NC 27101 USA
[9] Duke Univ, Sch Med, Dept Med, Durham, NC 27706 USA
[10] Univ Chicago, Dept Med, Sect Genet Med, 5841 S Maryland Ave, Chicago, IL 60637 USA
[11] Loyola Univ Chicago, Parkinson Sch Hlth Sci & Publ Hlth, Dept Publ Hlth Sci, Maywood, IL USA
[12] Univ Cape Town, Fac Hlth Sci, Dept Human Biol, Cape Town, South Africa
[13] Loyola Univ Chicago, Dept Biol, Chicago, IL 60660 USA
[14] Loyola Univ Chicago, Dept Comp Sci, Chicago, IL 60660 USA
来源
关键词
GENOME-WIDE ASSOCIATION; GENE-EXPRESSION; VARIABLE SELECTION; COMPLEX TRAITS; REGRESSION; CETP; STRATIFICATION; REGULARIZATION; INFERENCE; HDL;
D O I
10.1016/j.xhgg.2020.100019
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Performance Comparison of Machine Learning Models for Concrete Compressive Strength Prediction
    Sah, Amit Kumar
    Hong, Yao-Ming
    MATERIALS, 2024, 17 (09)
  • [22] Comparative Analysis of Machine Learning Models for Performance Prediction of the SPEC Benchmarks
    Tousi, Ashkan
    Lujan, Mikel
    IEEE ACCESS, 2022, 10 : 11994 - 12011
  • [23] Pathogenic germline variants in patients with endometrial cancer across diverse ancestries
    Liu, Ying L.
    Gordhandas, Sushmita
    Arora, Kanika
    Maio, Anna
    Kemel, Yelena
    Sheehan, Margaret
    Salo-Mullen, Erin E.
    Zhou, Qin
    Iasonos, Alexia
    Selenica, Pier
    Bandlamudi, Chaitanya
    Berger, Michael F.
    Abu-Rustum, Nadeem
    Ellenson, Lora H.
    Mandelker, Diana
    Offit, Kenneth
    Stadler, Zsofia Kinga
    Weigelt, Britta
    Aghajanian, Carol
    Brown, Carol L.
    JOURNAL OF CLINICAL ONCOLOGY, 2023, 41 (16)
  • [24] Machine learning models and bankruptcy prediction
    Barboza, Flavio
    Kimura, Herbert
    Altman, Edward
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 83 : 405 - 417
  • [25] Early Detection of Skin Diseases Across Diverse Skin Tones Using Hybrid Machine Learning and Deep Learning Models
    Aquil, Akasha
    Saeed, Faisal
    Baowidan, Souad
    Ali, Abdullah Marish
    Elmitwally, Nouh Sabri
    INFORMATION, 2025, 16 (02)
  • [26] Evaluation of machine learning models for the prediction of Alzheimer's: In search of the best performance
    Cabanillas-Carbonell, Michael
    Zapata-Paulini, Joselyn
    BRAIN BEHAVIOR & IMMUNITY-HEALTH, 2025, 44
  • [27] Effects of Different Training Datasets on Machine Learning Models for Pavement Performance Prediction
    Aranha, Ana Luisa
    Bernucci, Liedi Legi Bariani
    Vasconcelos, Kamilla L.
    TRANSPORTATION RESEARCH RECORD, 2023, 2677 (08) : 196 - 206
  • [28] Performance Prediction and Evaluation in Female Handball Players Using Machine Learning Models
    Oytun, Musa
    Tinazci, Cevdet
    Sekeroglu, Boran
    Acikada, Caner
    Yavuz, Hasan Ulas
    IEEE ACCESS, 2020, 8 : 116321 - 116335
  • [29] Differential Performance of Machine Learning Models in Prediction of Procedure-Specific Outcomes
    Kevin A. Chen
    Matthew E. Berginski
    Chirag S. Desai
    Jose G. Guillem
    Jonathan Stem
    Shawn M. Gomez
    Muneera R. Kapadia
    Journal of Gastrointestinal Surgery, 2022, 26 (8) : 1732 - 1742
  • [30] Differential Performance of Machine Learning Models in Prediction of Procedure-Specific Outcomes
    Chen, Kevin A.
    Berginski, Matthew E.
    Desai, Chirag S.
    Guillem, Jose G.
    Stem, Jonathan
    Gomez, Shawn M.
    Kapadia, Muneera R.
    JOURNAL OF GASTROINTESTINAL SURGERY, 2022, 26 (08) : 1732 - 1742