Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest

被引:98
|
作者
Collin, Francois-David [1 ]
Durif, Ghislain [1 ]
Raynal, Louis [1 ]
Lombaert, Eric [2 ]
Gautier, Mathieu [3 ]
Vitalis, Renaud [3 ]
Marin, Jean-Michel [1 ]
Estoup, Arnaud [3 ]
机构
[1] Univ Montpellier, CNRS, UMR 5149, IMAG, Montpellier, France
[2] Univ Cote Azur, CNRS, INRAE, ISA, Sophia Antipolis, France
[3] Univ Montpellier, CBGP, CIRAD, INRAE,Inst Agro, Montpellier, France
关键词
approximate Bayesian computation; demographic history; model or scenario selection; parameter estimation; pool‐ sequencing; population genetics; random forest; SNP; supervised machine learning; MODEL CHOICE; ABC;
D O I
10.1111/1755-0998.13413
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.
引用
收藏
页码:2598 / 2613
页数:16
相关论文
共 31 条
  • [1] Using Approximate Bayesian Computation to infer sex ratios from acoustic data
    Lehnen, Lisa
    Schorcht, Wigbert
    Karst, Inken
    Biedermann, Martin
    Kerth, Gerald
    Puechmaille, Sebastien J.
    PLOS ONE, 2018, 13 (06):
  • [2] Reconstructing the demographic history of orang-utans using Approximate Bayesian Computation
    Nater, Alexander
    Greminger, Maja P.
    Arora, Natasha
    van Schaik, Carel P.
    Goossens, Benoit
    Singleton, Ian
    Verschoor, Ernst J.
    Warren, Kristin S.
    Kruetzen, Michael
    MOLECULAR ECOLOGY, 2015, 24 (02) : 310 - 327
  • [3] Computationally Efficient Demographic History Inference from Allele Frequencies with Supervised Machine Learning
    Tran, Linh N.
    Sun, Connie K.
    Struck, Travis J.
    Sajan, Mathews
    Gutenkunst, Ryan N.
    MOLECULAR BIOLOGY AND EVOLUTION, 2024, 41 (05)
  • [4] DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data
    Cornuet, Jean-Marie
    Pudlo, Pierre
    Veyssier, Julien
    Dehne-Garcia, Alexandre
    Gautier, Mathieu
    Leblois, Raphael
    Marin, Jean-Michel
    Estoup, Arnaud
    BIOINFORMATICS, 2014, 30 (08) : 1187 - 1189
  • [5] Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation
    Li, Sen
    Jakobsson, Mattias
    BMC GENETICS, 2012, 13
  • [6] Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation
    Sen Li
    Mattias Jakobsson
    BMC Genetics, 13
  • [7] Understanding the recent colonization history of a plant pathogenic fungus using population genetic tools and Approximate Bayesian Computation
    Barres, B.
    Carlier, J.
    Seguin, M.
    Fenouillet, C.
    Cilas, C.
    Ravigne, V.
    HEREDITY, 2012, 109 (05) : 269 - 279
  • [8] Understanding the recent colonization history of a plant pathogenic fungus using population genetic tools and Approximate Bayesian Computation
    B Barrès
    J Carlier
    M Seguin
    C Fenouillet
    C Cilas
    V Ravigné
    Heredity, 2012, 109 : 269 - 279
  • [9] Model choice using Approximate Bayesian Computation and Random Forests: analyses based on model grouping to make inferences about the genetic history of Pygmy human populations
    Estoup, Arnaud
    Raynal, Louis
    Verdu, Paul
    Marin, Jean-Michel
    JOURNAL OF THE SFDS, 2018, 159 (03): : 167 - 190
  • [10] Using natural history to guide supervised machine learning for cryptic species delimitation with genetic data
    Derkarabetian, Shahan
    Starrett, James
    Hedin, Marshal
    FRONTIERS IN ZOOLOGY, 2022, 19 (01)