INCORPORATING COMPOSITIONAL HETEROGENEITY INTO LIE MARKOV MODELS FOR PHYLOGENETIC INFERENCE

被引:1
|
作者
Hannaford, Naomi E. [1 ]
Heaps, Sarah E. [1 ]
Nye, Tom M. W. [1 ]
Williams, Tom A. [2 ]
Embley, T. Martin [3 ]
机构
[1] Newcastle Univ, Sch Math Stat & Phys, Newcastle Upon Tyne, Tyne & Wear, England
[2] Univ Bristol, Sch Biol Sci, Bristol, Avon, England
[3] Newcastle Univ, Inst Cell & Mol Biosci, Newcastle Upon Tyne, Tyne & Wear, England
来源
ANNALS OF APPLIED STATISTICS | 2020年 / 14卷 / 04期
基金
英国工程与自然科学研究理事会;
关键词
Compositional heterogeneity; Lie Markov models; phylogenetics; rooting; MAXIMUM-LIKELIHOOD; DNA-SEQUENCES; MITOCHONDRIAL; NONSTATIONARY; BIASES; ROOT;
D O I
10.1214/20-AOAS1369
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is nonstationary with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a nonreversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its subtrees could have arisen from the same family of nonstationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which nonreversible but stationary and nonstationary but reversible models cannot identify a plausible root.
引用
收藏
页码:1964 / 1983
页数:20
相关论文
共 50 条
  • [31] Temporal Parallelization of Inference in Hidden Markov Models
    Hassan, Sayed Sakira
    Sarkka, Simo
    Garcia-Fernandez, Angel F.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4875 - 4887
  • [32] STATISTICAL INFERENCE REGARDING MARKOV CHAIN MODELS
    CHATFIELD, C
    THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1973, 22 (01): : 7 - 20
  • [33] Complexity of Representation and Inference in Compositional Models with Part Sharing
    Yuille, Alan
    Mottaghi, Roozbeh
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [34] Discriminating between rate heterogeneity and interspecific recombination in DNA sequence alignments with phylogenetic factorial hidden Markov models
    Husmeier, D
    BIOINFORMATICS, 2005, 21 : 166 - 172
  • [35] Low-Parameter Phylogenetic Inference Under the General Markov Model
    Holland, Barbara R.
    Jarvis, Peter D.
    Sumner, Jeremy G.
    SYSTEMATIC BIOLOGY, 2013, 62 (01) : 78 - 92
  • [36] Bayesian phylogenetic inference via Markov chain Monte Carlo methods
    Mau, B
    Newton, MA
    Larget, B
    BIOMETRICS, 1999, 55 (01) : 1 - 12
  • [37] On the inference of complex phylogenetic networks by Markov Chain Monte-Carlo
    Rabier, Charles-Elie
    Berry, Vincent
    Stoltz, Marnus
    Santos, Joao D.
    Wang, Wensheng
    Glaszmann, Jean-Christophe
    Pardi, Fabio
    Scornavacca, Celine
    Kosakovsky Pond, Sergei L.
    Noble, William Stafford
    Kosakovsky Pond, Sergei L.
    Noble, William Stafford
    Kosakovsky Pond, Sergei L.
    Noble, William Stafford
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (09)
  • [38] LAGGED COUPLINGS DIAGNOSE MARKOV CHAIN MONTE CARLO PHYLOGENETIC INFERENCE
    Kelly, Luke J.
    Ryder, Robin J.
    Clarte, Gregoire
    ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1419 - 1443
  • [39] Assessing sequence heterogeneity in Chlorellaceae DNA barcode markers for phylogenetic inference
    Ee Bhei Wong
    Nurhaida Kamaruddin
    Marina Mokhtar
    Norjan Yusof
    Raja Farhana R. Khairuddin
    Journal of Genetic Engineering and Biotechnology, 21
  • [40] Phylogenetic inference in Rafflesiales: the influence of rate heterogeneity and horizontal gene transfer
    Nickrent, DL
    Blarer, A
    Qiu, YL
    Vidal-Russell, R
    Anderson, FE
    BMC EVOLUTIONARY BIOLOGY, 2004, 4 (1)