Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

被引:1
|
作者
Erbert, Marius [1 ]
Rechner, Steffen [1 ]
Mueller-Hannemann, Matthias [1 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, Halle, Germany
来源
ALGORITHMS IN BIOINFORMATICS | 2016年 / 9838卷
关键词
D O I
10.1007/978-3-319-43681-4_12
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k >= 32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools by up to a factor of 4 for large genome data sets.
引用
收藏
页码:150 / 161
页数:12
相关论文
共 50 条
  • [41] On Fast and Memory-Efficient Construction of an Antidictionary Array
    Fukae, Hirotada
    Ota, Takahiro
    Morita, Hiroyoshi
    2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
  • [42] Fast and Memory-Efficient Algorithms for Evacuation Problems
    Schloeter, Miriam
    Skutella, Martin
    PROCEEDINGS OF THE TWENTY-EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2017, : 821 - 840
  • [43] Fast and memory-efficient minimum spanning tree on the
    Rostrup, Scott
    Srivastava, Shweta
    Singhal, Kishore
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2013, 8 (01) : 21 - 33
  • [44] Fast memory-efficient generalized belief propagation
    Kumar, M. Pawan
    Torr, P. H. S.
    COMPUTER VISION - ECCV 2006, PT 4, PROCEEDINGS, 2006, 3954 : 451 - 463
  • [45] A fast and memory-efficient implementation of the transfer bootstrap
    Lutteropp, Sarah
    Kozlov, Alexey M.
    Stamatakis, Alexandros
    BIOINFORMATICS, 2020, 36 (07) : 2280 - 2281
  • [46] A memory-efficient and fast Huffman decoding algorithm
    Chen, HC
    Wang, YL
    Lan, YF
    INFORMATION PROCESSING LETTERS, 1999, 69 (03) : 119 - 122
  • [47] A Memory-Efficient GPU Method for Hamming and Levenshtein Distance Similarity
    Todd, Andrew
    Nourian, Marziyeh
    Becchi, Michela
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 408 - 418
  • [48] KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
    F. P. Breitwieser
    D. N. Baker
    S. L. Salzberg
    Genome Biology, 19
  • [49] KrakenUniq: confident and fast metagenomics classification using unique k-mer counts
    Breitwieser, F. P.
    Baker, D. N.
    Salzberg, S. L.
    GENOME BIOLOGY, 2018, 19
  • [50] Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping
    Marcolin, Mattia
    Andreace, Francesco
    Comin, Matteo
    BIOINFORMATICS: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL 3: BIOINFORMATICS, 2021, : 62 - 70