INCORPORATING COMPOSITIONAL HETEROGENEITY INTO LIE MARKOV MODELS FOR PHYLOGENETIC INFERENCE

被引:1
|
作者
Hannaford, Naomi E. [1 ]
Heaps, Sarah E. [1 ]
Nye, Tom M. W. [1 ]
Williams, Tom A. [2 ]
Embley, T. Martin [3 ]
机构
[1] Newcastle Univ, Sch Math Stat & Phys, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Bristol, Sch Biol Sci, Bristol, Avon, England
[3] Newcastle Univ, Inst Cell & Mol Biosci, Newcastle Upon Tyne, Tyne & Wear, England
来源
ANNALS OF APPLIED STATISTICS | 2020年 / 14卷 / 04期
基金
英国工程与自然科学研究理事会;
关键词
Compositional heterogeneity; Lie Markov models; phylogenetics; rooting; MAXIMUM-LIKELIHOOD; DNA-SEQUENCES; MITOCHONDRIAL; NONSTATIONARY; BIASES; ROOT;
D O I
10.1214/20-AOAS1369
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is nonstationary with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a nonreversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its subtrees could have arisen from the same family of nonstationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which nonreversible but stationary and nonstationary but reversible models cannot identify a plausible root.
引用
收藏
页码:1964 / 1983
页数:20
相关论文
共 50 条
  • [1] The impact of rate heterogeneity on inference of phylogenetic models of trait evolution
    Chira, A. M.
    Thomas, G. H.
    JOURNAL OF EVOLUTIONARY BIOLOGY, 2016, 29 (12) : 2502 - 2518
  • [2] Lie Markov models
    Sumner, J. G.
    Fernandez-Sanchez, J.
    Jarvis, P. D.
    JOURNAL OF THEORETICAL BIOLOGY, 2012, 298 : 16 - 31
  • [3] Foundations of compositional models: inference
    Bina, Vl.
    Jirousek, R.
    Kratochvil, V.
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2021, 50 (04) : 409 - 433
  • [4] Improvement of phylogenetic method to analyze compositional heterogeneity
    Zhang, Zehua
    Guo, Kecheng
    Pan, Gaofeng
    Tang, Jijun
    Guo, Fei
    BMC SYSTEMS BIOLOGY, 2017, 11
  • [5] Compositional Heterogeneity and Phylogenomic Inference of Metazoan Relationships
    Nesnidal, Maximilian P.
    Helmkampf, Martin
    Bruchhaus, Iris
    Hausdorf, Bernhard
    MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (09) : 2095 - 2104
  • [6] Accounting for gene rate heterogeneity in phylogenetic inference
    Bevan, Rachel B.
    Bryant, David
    Lang, B. Franz
    SYSTEMATIC BIOLOGY, 2007, 56 (02) : 194 - 205
  • [7] On the Detection of Symmetries in Compositional Markov Models
    Lamprecht, Ruth
    Kemper, Peter
    SIXTH INTERNATIONAL CONFERENCE ON THE QUANTITATIVE EVALUATION OF SYSTEMS, PROCEEDINGS, 2009, : 259 - 268
  • [8] Inference and Learning with Hierarchical Compositional Models
    Kokkinos, Iasonas
    Yuille, Alan
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 533 - 533
  • [9] Novel Phylogenetic Network Inference by Combining Maximum Likelihood and Hidden Markov Models (Extended Abstract)
    Snir, Sagi
    Tuller, Tamir
    ALGORITHMS IN BIOINFORMATICS, WABI 2008, 2008, 5251 : 354 - +
  • [10] Irreversible Markov processes for phylogenetic models
    Bohl, E
    Lancaster, P
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2003, 10 (07) : 577 - 593