SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

被引:66
|
作者
Abuin, Jose M. [1 ]
Pichel, Juan C. [1 ]
Pena, Tomas F. [1 ]
Amigo, Jorge [2 ,3 ]
机构
[1] Univ Santiago de Compostela, Ctr Invest Tecnoloxias Informac CITIUS, Santiago De Compostela, Spain
[2] Fdn Publ Galega Med Xenom SERGAS, Santiago De Compostela, Spain
[3] Inst Invest Sanitaria Santiago de Compostela, Grp Med Xenom, Santiago De Compostela, Spain
来源
PLOS ONE | 2016年 / 11卷 / 05期
关键词
READ ALIGNMENT; ALIGNER; FORMAT;
D O I
10.1371/journal.pone.0155461
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] MONITORING LYMPHOCYTE POPULATIONS WITH HIGH-THROUGHPUT DNA SEQUENCING
    Boyd, Scott D.
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2013, 35 : 36 - 36
  • [32] High-throughput DNA Sequencing and Bioinformatics: Bottlenecks and Opportunities
    Tsui, Stephen Kwok-Wing
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 4 - 4
  • [33] Applications of high-throughput DNA sequencing to benign hematology
    Sankaran, Vijay G.
    Gallagher, Patrick G.
    BLOOD, 2013, 122 (22) : 3575 - 3582
  • [34] High-throughput sequencing of cytosine methylation in plant DNA
    Thomas J Hardcastle
    Plant Methods, 9
  • [35] RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA
    Wright, Imogen A.
    Travers, Simon A.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (13) : e106
  • [36] A beginners guide to SNP calling from high-throughput DNA-sequencing data
    Altmann, Andre
    Weber, Peter
    Bader, Daniel
    Preuss, Michael
    Binder, Elisabeth B.
    Mueller-Myhsok, Bertram
    HUMAN GENETICS, 2012, 131 (10) : 1541 - 1554
  • [37] A beginners guide to SNP calling from high-throughput DNA-sequencing data
    André Altmann
    Peter Weber
    Daniel Bader
    Michael Preuß
    Elisabeth B. Binder
    Bertram Müller-Myhsok
    Human Genetics, 2012, 131 : 1541 - 1554
  • [38] A Scalable High-Throughput Pipeline Architecture for DNA Sequence Alignment
    Ghosh, Surajeet
    Mandal, Sriparna
    Ray, Sanchita Saha
    TENCON 2015 - 2015 IEEE REGION 10 CONFERENCE, 2015,
  • [39] Comparison of high-throughput sequencing data compression tools
    Numanagic, Ibrahim
    Bonfield, James K.
    Hach, Faraz
    Voges, Jan
    Ostermann, Joern
    Alberti, Claudio
    Mattavelli, Marco
    Sahinalp, S. Cenk
    NATURE METHODS, 2016, 13 (12) : 1005 - +
  • [40] Need for speed in high-throughput sequencing data analysis
    Pluss, M.
    Caspar, S. M.
    Meienberg, J.
    Kopps, A. M.
    Keller, I.
    Bruggmann, R.
    Vogel, M.
    Matyas, G.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 722