Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning

被引:0
|
作者
Tran, Linh N. [1 ,2 ]
Sun, Connie K. [2 ]
Struck, Travis J. [2 ]
Sajan, Mathews [2 ]
Gutenkunst, Ryan N. [2 ]
机构
[1] Univ Arizona, Genet Grad Interdisciplinary Program, Tucson, AZ 85721 USA
[2] Univ Arizona, Dept Mol & Cellular Biol, Tucson, AZ 85721 USA
基金
美国国家卫生研究院;
关键词
population genomics; demographic history inference; machine learning; POPULATION-GENETICS; SPECTRUM;
D O I
10.1093/molbev/msae077
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Inferring past demographic history of natural populations from genomic data is of central concern in many studies across research fields. Previously, our group had developed dadi, a widely used demographic history inference method based on the allele frequency spectrum (AFS) and maximum composite-likelihood optimization. However, dadi's optimization procedure can be computationally expensive. Here, we present donni (demography optimization via neural network inference), a new inference method based on dadi that is more efficient while maintaining comparable inference accuracy. For each dadi-supported demographic model, donni simulates the expected AFS for a range of model parameters then trains a set of Mean Variance Estimation neural networks using the simulated AFS. Trained networks can then be used to instantaneously infer the model parameters from future genomic data summarized by an AFS. We demonstrate that for many demographic models, donni can infer some parameters, such as population size changes, very well and other parameters, such as migration rates and times of demographic events, fairly well. Importantly, donni provides both parameter and confidence interval estimates from input AFS with accuracy comparable to parameters inferred by dadi's likelihood optimization while bypassing its long and computationally intensive evaluation process. donni's performance demonstrates that supervised machine learning algorithms may be a promising avenue for developing more sustainable and computationally efficient demographic history inference methods.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Computationally Efficient Composite Likelihood Statistics for Demographic Inference
    Coffman, Alec J.
    Hsieh, Ping Hsun
    Gravel, Simon
    Gutenkunst, Ryan N.
    MOLECULAR BIOLOGY AND EVOLUTION, 2016, 33 (02) : 591 - 593
  • [2] A Search for Computationally Efficient Supervised Learning Algorithms of Anomalous Traffic
    Jeong, Hae-Duck J.
    Jeong, Gil-Seong
    Kim, Won-Jung
    Kim, Jinwon
    Song, Hanbin
    Ryu, Myeong-Un
    Lee, Jongsuk R.
    INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING, IMIS-2017, 2018, 612 : 590 - 600
  • [3] Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data
    Warmuth, Vera M.
    Ellegren, Hans
    MOLECULAR ECOLOGY RESOURCES, 2019, 19 (03) : 586 - 596
  • [4] A parsimonious, computationally efficient machine learning method for spatial regression
    Zukovic, Milan
    Hristopulos, Dionissios T.
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2024,
  • [5] Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest
    Collin, Francois-David
    Durif, Ghislain
    Raynal, Louis
    Lombaert, Eric
    Gautier, Mathieu
    Vitalis, Renaud
    Marin, Jean-Michel
    Estoup, Arnaud
    MOLECULAR ECOLOGY RESOURCES, 2021, 21 (08) : 2598 - 2613
  • [6] Data compression and inference in cosmology with self-supervised machine learning
    Akhmetzhanova, Aizhan
    Mishra-Sharma, Siddharth
    Dvorkin, Cora
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2023, 527 (03) : 7459 - 7481
  • [7] Data compression and inference in cosmology with self-supervised machine learning
    Akhmetzhanova, Aizhan
    Mishra-Sharma, Siddharth
    Dvorkin, Cora
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 527 (03) : 7459 - 7481
  • [8] A computationally efficient SUPANOVA: Spline kernel based machine learning tool
    Szymanski, Boleslaw K.
    Zhu, Lijuan
    Han, Long
    Embrechts, Mark
    Ross, Alexander
    Sternickel, Karsten
    SOFT COMPUTING IN INDUSTRIAL APPLICATIONS: RECENT AND EMERGING METHODS AND TECHNIQUES, 2007, 39 : 144 - +
  • [9] PopHist: inferring population history from the spectrum of allele frequencies
    Wooding, S
    BIOINFORMATICS, 2003, 19 (04) : 539 - 540
  • [10] Supervised Machine Learning Techniques for Efficient Network Intrusion Detection
    Aboueata, Nada
    Alrasbi, Sara
    Erbad, Aiman
    Kassler, Andreas
    Bhamare, Deval
    2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2019,