What Do We Gain When Tolerating Loss? The Information Bottleneck Wrings Out Recombination

被引:0
|
作者
Narechania, Apurva [1 ,2 ]
Bobo, Dean [1 ,3 ]
Desalle, Rob [1 ]
Mathema, Barun [4 ]
Kreiswirth, Barry [5 ]
Planet, Paul J. [1 ,6 ,7 ]
机构
[1] Amer Museum Nat Hist, Inst Comparat Genom, New York, NY 10024 USA
[2] Univ Copenhagen, Globe Inst, Sect Hologen, Copenhagen, Denmark
[3] Columbia Univ, Dept Ecol Evolut & Environm Biol, New York, NY USA
[4] Columbia Univ, Mailman Sch Publ Hlth, Dept Epidemiol, New York, NY USA
[5] Hackensack Meridian Hlth, Ctr Discovery & Innovat, Nutley, NJ USA
[6] Childrens Hosp Philadelphia, Div Infect Dis, Philadelphia, PA 19104 USA
[7] Univ Penn, Perelman Sch Med, Dept Pediat, Philadelphia, PA 19104 USA
关键词
microbial evolution; recombination; information theory; STAPHYLOCOCCUS-AUREUS; GENOME; ALIGNMENT; DIVERGENCE; SEQUENCE; USA300; TOOL;
D O I
10.1093/molbev/msaf029
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Most microbes have the capacity to acquire genetic material from their environment. Recombination of foreign DNA yields genomes that are, at least in part, incongruent with the vertical history of their species. Dominant approaches for detecting these transfers are phylogenetic, requiring a painstaking series of analyses including alignment and tree reconstruction. But these methods do not scale. Here, we propose an unsupervised, alignment-free, and tree-free technique based on the sequential information bottleneck, an optimization procedure designed to extract some portion of relevant information from 1 random variable conditioned on another. In our case, this joint probability distribution tabulates occurrence counts of k-mers against their genomes of origin with the expectation that recombination will create a strong signal that unifies certain sets of co-occurring k-mers. We conceptualize the technique as a rate-distortion problem, measuring distortion in the relevance information as k-mers are compressed into clusters based on their co-occurrence in the source genomes. The result is fast, model-free, lossy compression of k-mers into learned groups of shared genome sequence, differentiating recombined elements from the vertically inherited core. We show that the technique yields a new recombination measure based purely on information, divorced from any biases and limitations inherent to alignment and phylogeny.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] What to do when we run out of water
    Kelly, Paul
    NATURE CLIMATE CHANGE, 2014, 4 (05) : 314 - 316
  • [2] What to do when we run out of water
    Paul Kelly
    Nature Climate Change, 2014, 4 : 314 - 316
  • [3] When do we do what we are?
    Hansen, DT
    Philosophy of Education 2005, 2005, : 17 - 20
  • [4] What do we gain?
    Kramer, Bradley A.
    INDUSTRIAL ENGINEER, 2008, 40 (10): : 10 - 11
  • [5] WHEN DO WE DO WHAT WE DO
    ROSS, G
    PHILOSOPHICAL STUDIES, 1977, 32 (04) : 419 - 423
  • [6] What do we mean when we talk about information policies
    Anglada, Lluis
    PROFESIONAL DE LA INFORMACION, 2014, 23 (02): : 105 - 111
  • [7] Who needs what and when, and how do we sort that out?
    Guedeney, Antoine
    JORNAL DE PEDIATRIA, 2018, 94 (05) : 458 - 459
  • [8] WHAT DO WE DO WHEN WE DO MATHEMATICS
    SNAPPER, E
    MATHEMATICAL INTELLIGENCER, 1988, 10 (04): : 53 - 58
  • [9] Myocardial SPECT: what do we gain from attenuation correction (and when)?
    Dondi, M
    Fagioli, G
    Salgarello, M
    Zoboli, S
    Nanni, C
    Cidda, C
    QUARTERLY JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2004, 48 (03): : 181 - 187
  • [10] When Reason Fails Us: How We Act and What We Do When We Do Not Know What to Do
    Carter, Jacoby Adeshei
    Scott, Sarah Louise
    PLURALIST, 2013, 8 (01): : 63 - 96