Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

被引:1
|
作者
Erbert, Marius [1 ]
Rechner, Steffen [1 ]
Mueller-Hannemann, Matthias [1 ]
机构
[1] Univ Halle Wittenberg, Inst Comp Sci, Halle, Germany
来源
ALGORITHMS IN BIOINFORMATICS | 2016年 / 9838卷
关键词
D O I
10.1007/978-3-319-43681-4_12
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k >= 32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools by up to a factor of 4 for large genome data sets.
引用
收藏
页码:150 / 161
页数:12
相关论文
共 50 条
  • [1] Gerbil: a fast and memory-efficient k-mer counter with GPU-support
    Erbert, Marius
    Rechner, Steffen
    Mueller-Hannemann, Matthias
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
  • [2] Gerbil: a fast and memory-efficient k-mer counter with GPU-support
    Marius Erbert
    Steffen Rechner
    Matthias Müller-Hannemann
    Algorithms for Molecular Biology, 12
  • [3] K-mer Counting: memory-efficient strategy, parallel computing and field of application for Bioinformatics
    Xiao, Ming
    Li, Jiakun
    Hong, Song
    Yang, Yongtao
    Li, Junhua
    Wang, Jianxin
    Yang, Jian
    Ding, Wenbiao
    Zhang, Le
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2561 - 2567
  • [4] Accelerating K-mer Frequency Counting with GPU and Non-Volatile Memory
    Cadenelli, Nicola
    Polo, Jorda
    Carrera, David
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 434 - 441
  • [5] KCOSS: an ultra-fast k-mer counter for assembled genome analysis
    Tang, Deyou
    Li, Yucheng
    Tan, Daqiang
    Fu, Juan
    Tang, Yelei
    Lin, Jiabin
    Zhao, Rong
    Du, Hongli
    Zhao, Zhongming
    BIOINFORMATICS, 2022, 38 (04) : 933 - 940
  • [6] Efficient Techniques for k-mer Counting
    Mamun, Abdullah-Al
    Pal, Soumitra
    Rajasekaran, Sanguthevar
    2015 IEEE 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ADVANCES IN BIO AND MEDICAL SCIENCES (ICCABS), 2015,
  • [7] GPU Acceleration of Advanced k-mer Counting for Computational Genomics
    Li, Huiren
    Ramachandran, Anand
    Chen, Deming
    2018 IEEE 29TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2018, : 183 - 186
  • [8] KCMBT: a k-mer Counter based on Multiple Burst Trees
    Abdullah-Al Mamun
    Pal, Soumitra
    Rajasekaran, Sanguthevar
    BIOINFORMATICS, 2016, 32 (18) : 2783 - 2790
  • [9] A Bio-Sequence k-mer Frequency Counter (kFC)
    Biji, C. L.
    Nair, Achuthsankar S.
    Madhu, Manu K.
    Vijayakumar, R.
    2014 INTERNATIONAL CONFERENCE ON CIRCUITS, COMMUNICATION, CONTROL AND COMPUTING (I4C), 2014, : 353 - 356
  • [10] KAnalyze: a fast versatile pipelined K-mer toolkit
    Audano, Peter
    Vannberg, Fredrik
    BIOINFORMATICS, 2014, 30 (14) : 2070 - 2072