Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

被引：1

作者：

Erbert, Marius ^{[1
]}

Rechner, Steffen ^{[1
]}

Mueller-Hannemann, Matthias ^{[1
]}

机构：

[1] Univ Halle Wittenberg, Inst Comp Sci, Halle, Germany

来源：

ALGORITHMS IN BIOINFORMATICS | 2016年 / 9838卷

关键词：

D O I：

10.1007/978-3-319-43681-4_12

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for k >= 32. Given the technology trend towards long reads of next-generation sequencers, support for large k becomes increasingly important. While existing k-mer counting tools suffer from excessive memory resource consumption or degrading performance for large k, Gerbil is able to efficiently support large k without much loss of performance. Our software implements a two-disk approach. In the first step, DNA reads are loaded from disk and distributed to temporary files that are stored at a working disk. In a second step, the temporary files are read again, split into k-mers and counted via a hash table approach. In addition, Gerbil can optionally use GPUs to accelerate the counting step. For large k, we outperform state-of-the-art open source k-mer counting tools by up to a factor of 4 for large genome data sets.

引用

页码：150 / 161

页数：12

共 50 条

[21] From GPU to FPGA: A Pipelined Hierarchical Approach to Fast and Memory-efficient NDN Name Lookup
Li, Yanbiao
Zhang, Dafang
Yu, Xian
Long, Jing
Liang, Wei
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 106 - 106
[22] Space-efficient computation of k-mer dictionaries for large values of k
Diego Díaz-Domínguez
Miika Leinonen
Leena Salmela
Algorithms for Molecular Biology, 19
[23] Fast genotyping of known SNPs through approximate k-mer matching
Shajii, Ariya
Yorukoglu, Deniz
Yu, Yun William
Berger, Bonnie
BIOINFORMATICS, 2016, 32 (17) : 538 - 544
[24] GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs
Shrikumar, Avanti
Prakash, Eva
Kundaje, Anshul
BIOINFORMATICS, 2019, 35 (14) : I173 - I182
[25] Space-efficient computation of k-mer dictionaries for large values of k
Diaz-Dominguez, Diego
Leinonen, Miika
Salmela, Leena
ALGORITHMS FOR MOLECULAR BIOLOGY, 2024, 19 (01)
[26] TopKmer: Parallel High Frequency K-mer Counting on Distributed Memory
Li Mocheng
Chen Zhiguang
Xiao Nong
Liu Yang
Luo Xi
Chen Tao
NETWORK AND PARALLEL COMPUTING, NPC 2022, 2022, 13615 : 96 - 107
[27] GaKCo: A Fast Gapped k-mer String Kernel Using Counting
Singh, Ritambhara
Sekhon, Arshdeep
Kowsari, Kamran
Lanchantin, Jack
Wang, Beilun
Qi, Yanjun
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 356 - 373
[28] KMC 2: fast and resource-frugal k-mer counting
Deorowicz, Sebastian
Kokot, Marek
Grabowski, Szymon
Debudaj-Grabysz, Agnieszka
BIOINFORMATICS, 2015, 31 (10) : 1569 - 1576
[29] stringMLST: a fast k-mer based tool for multilocus sequence typing
Gupta, Anuj
Jordan, I. King
Rishishwar, Lavanya
BIOINFORMATICS, 2017, 33 (01) : 119 - 121
[30] High-frequency k-mer counting at low memory footprint
Li Mocheng
Liu Yang
Xiao Nong
Chen Zhiguang
ELECTRONICS LETTERS, 2022, 58 (25) : 940 - 942

← 1 2 3 4 5 →