INCORPORATING COMPOSITIONAL HETEROGENEITY INTO LIE MARKOV MODELS FOR PHYLOGENETIC INFERENCE

被引:1
|
作者
Hannaford, Naomi E. [1 ]
Heaps, Sarah E. [1 ]
Nye, Tom M. W. [1 ]
Williams, Tom A. [2 ]
Embley, T. Martin [3 ]
机构
[1] Newcastle Univ, Sch Math Stat & Phys, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Bristol, Sch Biol Sci, Bristol, Avon, England
[3] Newcastle Univ, Inst Cell & Mol Biosci, Newcastle Upon Tyne, Tyne & Wear, England
来源
ANNALS OF APPLIED STATISTICS | 2020年 / 14卷 / 04期
基金
英国工程与自然科学研究理事会;
关键词
Compositional heterogeneity; Lie Markov models; phylogenetics; rooting; MAXIMUM-LIKELIHOOD; DNA-SEQUENCES; MITOCHONDRIAL; NONSTATIONARY; BIASES; ROOT;
D O I
10.1214/20-AOAS1369
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is nonstationary with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a nonreversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its subtrees could have arisen from the same family of nonstationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which nonreversible but stationary and nonstationary but reversible models cannot identify a plausible root.
引用
收藏
页码:1964 / 1983
页数:20
相关论文
共 50 条
  • [21] Evaluation of the models handling heterotachy in phylogenetic inference
    Zhou, Yan
    Rodrigue, Nicolas
    Lartillot, Nicolas
    Philippe, Herve
    BMC EVOLUTIONARY BIOLOGY, 2007, 7 (1)
  • [22] Evaluation of the models handling heterotachy in phylogenetic inference
    Yan Zhou
    Nicolas Rodrigue
    Nicolas Lartillot
    Hervé Philippe
    BMC Evolutionary Biology, 7
  • [23] Base-compositional heterogeneity in the RAG1 locus among didelphid marsupials: Implications for phylogenetic inference and the evolution of GC content
    Gruber, Karl F.
    Voss, Robert S.
    Jansa, Sharon A.
    SYSTEMATIC BIOLOGY, 2007, 56 (01) : 83 - 96
  • [24] New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity
    Duchene, David A.
    Duchene, Sebastian
    Ho, Simon Y. W.
    MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (06) : 1529 - 1534
  • [25] Inference of collective Gaussian hidden Markov models
    Singh, Rahul
    Chen, Yongxin
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1637 - 1643
  • [26] Statistical inference for partially Hidden Markov Models
    Bordes, L
    Vandekerkhove, P
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2005, 34 (05) : 1081 - 1104
  • [27] Stochastic Variational Inference for Hidden Markov Models
    Foti, Nicholas J.
    Xu, Jason
    Laird, Dillon
    Fox, Emily B.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [28] VARIATIONAL BAYESIAN INFERENCE FOR PAIRWISE MARKOV MODELS
    Morales, Katherine
    Petetin, Yohan
    2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 251 - 255
  • [29] Inference with constrained hidden Markov models in PRISM
    Christiansen, Henning
    Have, Christian Theil
    Lassen, Ole Torp
    Petit, Matthieu
    THEORY AND PRACTICE OF LOGIC PROGRAMMING, 2010, 10 : 449 - 464
  • [30] Temporal Parallelization of Inference in Hidden Markov Models
    Hassan, Sakira
    Särkkä, Simo
    García-Fernández, Ángel
    IEEE Transactions on Signal Processing, 2021, 69 : 4875 - 4887