SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

被引：66

作者：

Abuin, Jose M. ^{[1
]}

Pichel, Juan C. ^{[1
]}

Pena, Tomas F. ^{[1
]}

Amigo, Jorge ^{[2
,3
]}

机构：

[1] Univ Santiago de Compostela, Ctr Invest Tecnoloxias Informac CITIUS, Santiago De Compostela, Spain

[2] Fdn Publ Galega Med Xenom SERGAS, Santiago De Compostela, Spain

[3] Inst Invest Sanitaria Santiago de Compostela, Grp Med Xenom, Santiago De Compostela, Spain

来源：

PLOS ONE | 2016年 / 11卷 / 05期

关键词：

READ ALIGNMENT; ALIGNER; FORMAT;

D O I：

10.1371/journal.pone.0155461

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license.

引用

页数：21

共 50 条

[1] SPEEDING UP THE ANALYSIS OF READ-COUNT DATA FROM HIGH-THROUGHPUT SEQUENCING
Wang, Weibo
Sun, Wei
Wang, Wei
Szatkiewicz, Jin
EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2017, 27 : S225 - S225
[2] Alignment of High-Throughput Sequencing Data Inside In-Memory Databases
Firnkorn, Daniel
Knaup-Gregori, Petra
Bermejo, Justo Lorenzo
Ganzinger, Matthias
E-HEALTH - FOR CONTINUITY OF CARE, 2014, 205 : 476 - 480
[3] A novel multi-alignment pipeline for high-throughput sequencing data
Huang, Shunping
Holt, James
Kao, Chia-Yu
McMillan, Leonard
Wang, Wei
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2014,
[4] High-throughput DNA sequencing: A genomic data manufacturing process
Huang, GM
DNA SEQUENCE, 1999, 10 (03): : 149 - 153
[5] DNA sequencing in high-throughput neuroanatomy
Kebschull, Justus M.
JOURNAL OF CHEMICAL NEUROANATOMY, 2019, 100
[6] High-throughput screening: speeding up porous materials discovery
Wollmann, Philipp
Leistner, Matthias
Stoeck, Ulrich
Gruenker, Ronny
Gedrich, Kristina
Klein, Nicole
Throl, Oliver
Graehlert, Wulf
Senkovska, Irena
Dreisbach, Frieder
Kaskel, Stefan
CHEMICAL COMMUNICATIONS, 2011, 47 (18) : 5151 - 5153
[7] Speeding up DNA sequencing
Biophotonics International, 1995, 2 (06):
[8] Optimizing a MapReduce Module of Preprocessing High-Throughput DNA Sequencing Data
Chung, Wei-Chun
Chang, Yu-Jung
Chen, Chien-Chih
Lee, Der-Tsai
Ho, Jan-Ming
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
[9] DNA Fragment Enrichment for High-Throughput Sequencing
A. N. Sinyakov
E. V. Kostina
Molecular Biology, 2023, 57 : 424 - 439
[10] Diagnostic Applications of High-Throughput DNA Sequencing
Boyd, Scott D.
ANNUAL REVIEW OF PATHOLOGY: MECHANISMS OF DISEASE, VOL 8, 2013, 8 : 381 - 410

← 1 2 3 4 5 →