SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

被引:66
|
作者
Abuin, Jose M. [1 ]
Pichel, Juan C. [1 ]
Pena, Tomas F. [1 ]
Amigo, Jorge [2 ,3 ]
机构
[1] Univ Santiago de Compostela, Ctr Invest Tecnoloxias Informac CITIUS, Santiago De Compostela, Spain
[2] Fdn Publ Galega Med Xenom SERGAS, Santiago De Compostela, Spain
[3] Inst Invest Sanitaria Santiago de Compostela, Grp Med Xenom, Santiago De Compostela, Spain
来源
PLOS ONE | 2016年 / 11卷 / 05期
关键词
READ ALIGNMENT; ALIGNER; FORMAT;
D O I
10.1371/journal.pone.0155461
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Genome variation discovery with high-throughput sequencing data
    Dalca, Adrian V.
    Brudno, Michael
    BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 3 - 14
  • [42] Savant: genome browser for high-throughput sequencing data
    Fiume, Marc
    Williams, Vanessa
    Brook, Andrew
    Brudno, Michael
    BIOINFORMATICS, 2010, 26 (16) : 1938 - 1944
  • [43] Comparison of high-throughput sequencing data compression tools
    Ibrahim Numanagić
    James K Bonfield
    Faraz Hach
    Jan Voges
    Jörn Ostermann
    Claudio Alberti
    Marco Mattavelli
    S Cenk Sahinalp
    Nature Methods, 2016, 13 : 1005 - 1008
  • [44] Quality assessment and control of high-throughput sequencing data
    Watson, Mick
    FRONTIERS IN GENETICS, 2014, 5
  • [45] High-throughput DNA synthesis for data storage
    Yu, Meng
    Tang, Xiaohui
    Li, Zhenhua
    Wang, Weidong
    Wang, Shaopeng
    Li, Min
    Yu, Qiuliyang
    Xie, Sijia
    Zuo, Xiaolei
    Chen, Chang
    CHEMICAL SOCIETY REVIEWS, 2024, 53 (09) : 4463 - 4489
  • [46] High-throughput DNA sequence data compression
    Zhu, Zexuan
    Zhang, Yongpeng
    Ji, Zhen
    He, Shan
    Yang, Xiao
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (01) : 1 - 15
  • [47] Commercial high-throughput sequencing and its applications in DNA analysis
    Peng, Hai
    Zhang, Jing
    BIOLOGIA, 2009, 64 (01) : 20 - 26
  • [48] Utility of high-throughput DNA sequencing in the study of the human papillomaviruses
    Noé Escobar-Escamilla
    José Ernesto Ramírez-González
    Graciela Castro-Escarpulli
    José Alberto Díaz-Quiñonez
    Virus Genes, 2018, 54 : 17 - 24
  • [49] End-to-End Optimization of High-Throughput DNA Sequencing
    O'Reilly, Eliza
    Baccelli, Francois
    De Veciana, Gustavo
    Vikalo, Haris
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2016, 23 (10) : 789 - 800
  • [50] High-throughput DNA sequencing technologies for water and wastewater analysis
    Chan, Alexander W. Y.
    Naphtali, James
    Schellhorn, Herb E.
    SCIENCE PROGRESS, 2019, 102 (04) : 351 - 376