CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping

被引:57
|
作者
Nguyen T. [1 ]
Shi W. [1 ]
Ruden D. [2 ]
机构
[1] Computer Science Department, Wayne State University
[2] Institute of Environmental Health Sciences, Wayne State University
关键词
Cloud Computing; Single Nucleotide Polymorphism; Hadoop Distribute File System; MapReduce Framework; Crossbow;
D O I
10.1186/1756-0500-4-171
中图分类号
学科分类号
摘要
Background: Research in genetics has developed rapidly recently due to the aid of next generation sequencing (NGS). However, massively-parallel NGS produces enormous amounts of data, which leads to storage, compatibility, scalability, and performance issues. The Cloud Computing and MapReduce framework, which utilizes hundreds or thousands of shared computers to map sequencing reads quickly and efficiently to reference genome sequences, appears to be a very promising solution for these issues. Consequently, it has been adopted by many organizations recently, and the initial results are very promising. However, since these are only initial steps toward this trend, the developed software does not provide adequate primary functions like bisulfite, pair-end mapping, etc., in on-site software such as RMAP or BS Seeker. In addition, existing MapReduce-based applications were not designed to process the long reads produced by the most recent second-generation and third-generation NGS instruments and, therefore, are inefficient. Last, it is difficult for a majority of biologists untrained in programming skills to use these tools because most were developed on Linux with a command line interface. Results: To urge the trend of using Cloud technologies in genomics and prepare for advances in second- and third-generation DNA sequencing, we have built a Hadoop MapReduce-based application, CloudAligner, which achieves higher performance, covers most primary features, is more accurate, and has a user-friendly interface. It was also designed to be able to deal with long sequences. The performance gain of CloudAligner over Cloud-based counterparts (35 to 80%) mainly comes from the omission of the reduce phase. In comparison to local-based approaches, the performance gain of CloudAligner is from the partition and parallel processing of the huge reference genome as well as the reads. The source code of CloudAligner is available at http://cloudaligner.sourceforge.net/ and its web version is at http://mine.cs.wayne.edu:8080/CloudAligner/.. Conclusions: Our results show that CloudAligner is faster than CloudBurst, provides more accurate results than RMAP, and supports various input as well as output formats. In addition, with the web-based interface, it is easier to use than its counterparts. © 2011 Nguyen et al; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 20 条
  • [1] FAST, FULL-FEATURED 4-MBIT DRAM CONTROLLERS BEGIN TO SURFACE
    MARTIN, SL
    COMPUTER DESIGN, 1988, 27 (14): : 26 - 28
  • [2] A Full-Featured FPGA-Based Pipelined Architecture for SIFT Extraction
    Kreowsky, Philipp
    Stabernack, Benno
    IEEE ACCESS, 2021, 9 : 128564 - 128573
  • [3] A full-featured Pentium(R) PCI-based notebook computer
    Myers, TF
    HEWLETT-PACKARD JOURNAL, 1996, 47 (03): : 38 - 44
  • [4] Towards a Full-Featured Implementation of Attribute Based Credentials on Smart Cards
    de la Piedra, Antonio
    Hoepman, Jaap-Henk
    Vullers, Pim
    CRYPTOLOGY AND NETWORK SECURITY, CANS 2014, 2014, 8813 : 270 - 289
  • [5] Full-Featured Electrochemiluminescence Sensing Platform Based on the Multichannel Closed Bipolar System
    Zhang, Xiaowei
    Li, Jing
    Jia, Xiaofang
    Li, Dongyue
    Wang, Erkang
    ANALYTICAL CHEMISTRY, 2014, 86 (11) : 5595 - 5599
  • [6] A full-featured cooperative coevolutionary memory-based artificial immune system for dynamic optimization
    Etaati, Bahareh
    Ghorrati, Zahra
    Ebadzadeh, Mohammad Mehdi
    APPLIED SOFT COMPUTING, 2022, 117
  • [7] Full-Featured, Real-Time Database Searching Platform Enables Fast and Accurate Multiplexed Quantitative Proteomics
    Schweppe, Devin K.
    Eng, Jimmy K.
    Yu, Qing
    Bailey, Derek
    Rad, Ramin
    Navarrete-Perea, Jose
    Huttlin, Edward L.
    Erickson, Brian K.
    Paulo, Joao A.
    Gygi, Steven P.
    JOURNAL OF PROTEOME RESEARCH, 2020, 19 (05) : 2026 - 2034
  • [8] Prediction of the heat gain of external walls: An innovative approach for full-featured excitations based on the simplified method of Mackey-and-Wright
    Ruivo, C. R.
    Vaz, D. C.
    APPLIED ENERGY, 2015, 155 : 378 - 392
  • [9] Polymer crystallization regulation in liquid phase enables wearable full-featured thermoplastic-based smart Janus film
    Chen, Ruoqi
    Ma, Hui
    Ma, Xinlei
    Ai, Tianhao
    Chai, Yuqiao
    Zhang, Huanrong
    Li, Fengwang
    Wang, Xusheng
    Li, Chunhong
    Ji, Junhui
    Xue, Mianqi
    CHEMICAL ENGINEERING JOURNAL, 2023, 457
  • [10] Full-featured, error-resilient, scalable wavelet video codec based on the set partitioning in hierarchical trees (SPIHT) algorithm
    Cho, S
    Pearlman, WA
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (03) : 157 - 171