Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

被引:1
|
作者
Erbert, Marius [1 ]
Rechner, Steffen [1 ]
Mueller-Hannemann, Matthias [1 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, Halle, Germany
来源
ALGORITHMS IN BIOINFORMATICS | 2016年 / 9838卷
关键词
D O I
10.1007/978-3-319-43681-4_12
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k >= 32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools by up to a factor of 4 for large genome data sets.
引用
收藏
页码:150 / 161
页数:12
相关论文
共 50 条
  • [21] From GPU to FPGA: A Pipelined Hierarchical Approach to Fast and Memory-efficient NDN Name Lookup
    Li, Yanbiao
    Zhang, Dafang
    Yu, Xian
    Long, Jing
    Liang, Wei
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 106 - 106
  • [22] Space-efficient computation of k-mer dictionaries for large values of k
    Diego Díaz-Domínguez
    Miika Leinonen
    Leena Salmela
    Algorithms for Molecular Biology, 19
  • [23] Fast genotyping of known SNPs through approximate k-mer matching
    Shajii, Ariya
    Yorukoglu, Deniz
    Yu, Yun William
    Berger, Bonnie
    BIOINFORMATICS, 2016, 32 (17) : 538 - 544
  • [24] GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs
    Shrikumar, Avanti
    Prakash, Eva
    Kundaje, Anshul
    BIOINFORMATICS, 2019, 35 (14) : I173 - I182
  • [25] Space-efficient computation of k-mer dictionaries for large values of k
    Diaz-Dominguez, Diego
    Leinonen, Miika
    Salmela, Leena
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2024, 19 (01)
  • [26] TopKmer: Parallel High Frequency K-mer Counting on Distributed Memory
    Li Mocheng
    Chen Zhiguang
    Xiao Nong
    Liu Yang
    Luo Xi
    Chen Tao
    NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 96 - 107
  • [27] GaKCo: A Fast Gapped k-mer String Kernel Using Counting
    Singh, Ritambhara
    Sekhon, Arshdeep
    Kowsari, Kamran
    Lanchantin, Jack
    Wang, Beilun
    Qi, Yanjun
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 356 - 373
  • [28] KMC 2: fast and resource-frugal k-mer counting
    Deorowicz, Sebastian
    Kokot, Marek
    Grabowski, Szymon
    Debudaj-Grabysz, Agnieszka
    BIOINFORMATICS, 2015, 31 (10) : 1569 - 1576
  • [29] stringMLST: a fast k-mer based tool for multilocus sequence typing
    Gupta, Anuj
    Jordan, I. King
    Rishishwar, Lavanya
    BIOINFORMATICS, 2017, 33 (01) : 119 - 121
  • [30] High-frequency k-mer counting at low memory footprint
    Li Mocheng
    Liu Yang
    Xiao Nong
    Chen Zhiguang
    ELECTRONICS LETTERS, 2022, 58 (25) : 940 - 942