Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest

被引:98
|
作者
Collin, Francois-David [1 ]
Durif, Ghislain [1 ]
Raynal, Louis [1 ]
Lombaert, Eric [2 ]
Gautier, Mathieu [3 ]
Vitalis, Renaud [3 ]
Marin, Jean-Michel [1 ]
Estoup, Arnaud [3 ]
机构
[1] Univ Montpellier, CNRS, UMR 5149, IMAG, Montpellier, France
[2] Univ Cote Azur, CNRS, INRAE, ISA, Sophia Antipolis, France
[3] Univ Montpellier, CBGP, CIRAD, INRAE,Inst Agro, Montpellier, France
关键词
approximate Bayesian computation; demographic history; model or scenario selection; parameter estimation; pool‐ sequencing; population genetics; random forest; SNP; supervised machine learning; MODEL CHOICE; ABC;
D O I
10.1111/1755-0998.13413
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.
引用
收藏
页码:2598 / 2613
页数:16
相关论文
共 31 条
  • [21] Predicting hydrogen and oxygen indices (HI, OI) from conventional well logs using a Random Forest machine learning algorithm
    Gordon, John B.
    Sanei, Hamed
    Pedersen, Per K.
    INTERNATIONAL JOURNAL OF COAL GEOLOGY, 2022, 249
  • [22] Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
    Kristensen, Kris
    Olesen, Pernille H.
    Roerbaek, Anna K.
    Nielsen, Louise
    Hansen, Helle K.
    Cichosz, Simon L.
    Jensen, Morten H.
    Hejlesen, Ole
    CLINICAL RESPIRATORY JOURNAL, 2023, 17 (08): : 819 - 828
  • [23] PREDICTION OF SOIL MOISTURE FROM NEAR-GLOBAL CYGNSS GNSS-REFLECTOMETRY USING A RANDOM FOREST MACHINE LEARNING MODEL
    Wilson, M. D.
    Datta, R.
    Savarimuthu, S.
    Moller, D.
    Ruf, C.
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 4465 - 4471
  • [24] A Machine Learning Based Novel Approach of Predicting International Roughness Index(IRI) from Traffic Characteristics using Random Forest Regression
    Abir, Abrar Rahman
    PROCEEDINGS OF 2023 6TH ARTIFICIAL INTELLIGENCE AND CLOUD COMPUTING CONFERENCE, AICCC 2023, 2023, : 36 - 45
  • [25] High Temporal Rainfall Estimations from Himawari-8 Multiband Observations Using the Random-Forest Machine-Learning Method
    Hirose, Hitoshi
    Shige, Shoichi
    Yamamoto, Munehisa K.
    Higuchi, Atsushi
    JOURNAL OF THE METEOROLOGICAL SOCIETY OF JAPAN, 2019, 97 (03) : 689 - 710
  • [26] Using machine learning to predict mortality in older patients with cancer: Decision tree and random forest analyses from the ELCAPA and ONCODAGE prospective cohorts.
    Audureau, Etienne
    Soubeyran, Pierre-Louis
    Martinez-Tapia, Claudia
    Bellera, Carine A.
    Bastuji-Garin, Sylvie
    Boudou-Rouquette, Pascaline
    Rainfray, Muriel
    Chahwakilian, Anne
    Grellety, Thomas
    Hanon, Olivier
    Mathoulin-Pelissier, Simone
    Paillaud, Elena
    Canoui-Poitrine, Florence
    JOURNAL OF CLINICAL ONCOLOGY, 2019, 37 (15)
  • [27] Aeolian Desertification Dynamics from 1995 to 2020 in Northern China: Classification Using a Random Forest Machine Learning Algorithm Based on Google Earth Engine
    Zhang, Caixia
    Tan, Ningjing
    Li, Jinchang
    REMOTE SENSING, 2024, 16 (16)
  • [28] Comparison of different training data sets from simulation and experimental measurement with artificial users for occupancy detection - Using machine learning methods Random Forest and LASSO
    Parzinger, Michael
    Hanfstaengl, Lucia
    Sigg, Ferdinand
    Spindler, Uli
    Wellisch, Ulrich
    Wirnsberger, Markus
    BUILDING AND ENVIRONMENT, 2022, 223
  • [29] Predicting Coronary Stenosis Progression Using Plaque Fatigue From IVUS-Based Thin-Slice Models: A Machine Learning Random Forest Approach
    Guo, Xiaoya
    Maehara, Akiko
    Yang, Mingming
    Wang, Liang
    Zheng, Jie
    Samady, Habib
    Mintz, Gary S.
    Giddens, Don P.
    Tang, Dalin
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [30] Groundwater Potential Mapping Using Remote Sensing and Random Forest Machine Learning Model: A Case Study from Lower Part of Wadi Yalamlam, Western Saudi Arabia
    Madani, Ahmed
    Niyazi, Burhan
    SUSTAINABILITY, 2023, 15 (03)