Dashing: fast and accurate genomic distances with HyperLogLog

被引:54
|
作者
Baker, Daniel N. [1 ]
Langmead, Ben [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, 3400 N Charles St, Baltimore, MD 21218 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Sketch data structures; Hyperloglog; Metagenomics; Alignment; Sequencing; Genomic distance; DATABASE;
D O I
10.1186/s13059-019-1875-0
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Dashing: fast and accurate genomic distances with HyperLogLog
    Daniel N. Baker
    Ben Langmead
    Genome Biology, 20
  • [2] Fast and accurate estimation of the covariance between pairwise maximum likelihood distances
    Gil, Manuel
    PEERJ, 2014, 2
  • [3] Fast and accurate genomic analyses using genome graphs
    Goran Rakocevic
    Vladimir Semenyuk
    Wan-Ping Lee
    James Spencer
    John Browning
    Ivan J. Johnson
    Vladan Arsenijevic
    Jelena Nadj
    Kaushik Ghose
    Maria C. Suciu
    Sun-Gou Ji
    Gülfem Demir
    Lizao Li
    Berke Ç. Toptaş
    Alexey Dolgoborodov
    Björn Pollex
    Iosif Spulber
    Irina Glotova
    Péter Kómár
    Andrew L. Stachyra
    Yilong Li
    Milos Popovic
    Morten Källberg
    Amit Jain
    Deniz Kural
    Nature Genetics, 2019, 51 : 354 - 362
  • [4] Fast Geometric Method for Calculating Accurate Minimum Orbit Intersection Distances
    Wisniowski, T.
    Rickman, H.
    ACTA ASTRONOMICA, 2013, 63 (02): : 293 - 307
  • [5] Fast and accurate genomic analyses using genome graphs
    Rakocevic, Goran
    Semenyuk, Vladimir
    Lee, Wan-Ping
    Spencer, James
    Browning, John
    Johnson, Ivan J.
    Arsenijevic, Vladan
    Nadj, Jelena
    Ghose, Kaushik
    Suciu, Maria C.
    Ji, Sun-Gou
    Demir, Gulfem
    Li, Lizao
    Toptas, Berke C.
    Dolgoborodov, Alexey
    Pollex, Bjorn
    Spulber, Iosif
    Glotova, Irina
    Komar, Peter
    Stachyra, Andrew L.
    Li, Yilong
    Popovic, Milos
    Kallberg, Morten
    Jain, Amit
    Kural, Deniz
    NATURE GENETICS, 2019, 51 (02) : 354 - +
  • [6] Fast and Accurate Predictions of Protein NMR Chemical Shifts from Interatomic Distances
    Kohlhoff, Kai J.
    Robustelli, Paul
    Cavalli, Andrea
    Salvatella, Xavier
    Vendruscolo, Michele
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (39) : 13894 - +
  • [7] Weighted ASTRID: fast and accurate species trees from weighted internode distances
    Baqiao Liu
    Tandy Warnow
    Algorithms for Molecular Biology, 18
  • [8] andi: Fast and accurate estimation of evolutionary distances between closely related genomes
    Haubold, Bernhard
    Kloetzl, Fabian
    Pfaffelhuber, Peter
    BIOINFORMATICS, 2015, 31 (08) : 1169 - 1175
  • [9] Weighted ASTRID: fast and accurate species trees from weighted internode distances
    Liu, Baqiao
    Warnow, Tandy
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2023, 18 (01)
  • [10] distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data
    Zhao, Lei
    Nielsen, Rasmus
    Korneliussen, Thorfinn Sand
    MOLECULAR BIOLOGY AND EVOLUTION, 2022, 39 (06) : 1084 - 1097