A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework

被引:17
|
作者
Chang, Yu-Jung [1 ]
Chen, Chien-Chih [1 ,2 ]
Chen, Chuen-Liang [2 ]
Ho, Jan-Ming [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10764, Taiwan
来源
BMC GENOMICS | 2012年 / 13卷
关键词
Sequencing Error; Coverage Depth; Graph Construction; Position Weight Matrix; MapReduce Framework;
D O I
10.1186/1471-2164-13-S7-S28
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: State-of-the-art high-throughput sequencers, e.g., the Illumina HiSeq series, generate sequencing reads that are longer than 150 bp up to a total of 600 Gbp of data per run. The high-throughput sequencers generate lengthier reads with greater sequencing depth than those generated by previous technologies. Two major challenges exist in using the high-throughput technology for de novo assembly of genomes. First, the amount of physical memory may be insufficient to store the data structure of the assembly algorithm, even for high-end multicore processors. Moreover, the graph-theoretical model used to capture intersection relationships of the reads may contain structural defects that are not well managed by existing assembly algorithms. Results: We developed a distributed genome assembler based on string graphs and MapReduce framework, known as the CloudBrush. The assembler includes a novel edge-adjustment algorithm to detect structural defects by examining the neighboring reads of a specific read for sequencing errors and adjusting the edges of the string graph, if necessary. CloudBrush is evaluated against GAGE benchmarks to compare its assembly quality with the other assemblers. The results show that our assemblies have a moderate N50, a low misassembly rate of misjoins, and indels of > 5 bp. In addition, we have introduced two measures, known as precision and recall, to address the issues of faithfully aligned contigs to target genomes. Compared with the assembly tools used in the GAGE benchmarks, CloudBrush is shown to produce contigs with high precision and recall. We also verified the effectiveness of the edge-adjustment algorithm using simulated datasets and ran CloudBrush on a nematode dataset using a commercial cloud. CloudBrush assembler is available at https://github.com/ice91/CloudBrush.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] IDBA-MT: De Novo Assembler for Metatranscriptomic Data Generated from Next-Generation Sequencing Technology
    Leung, Henry C. M.
    Yiu, Siu-Ming
    Parkinson, John
    Chin, Francis Y. L.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (07) : 540 - 550
  • [12] Genetic Algorithm-Based Task Scheduling in Cloud Computing Using MapReduce Framework
    Peng, Zhihao
    Pirozmand, Poria
    Motevalli, Masoumeh
    Esmaeili, Ali
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [13] A Likelihood-Based Framework For De Novo Mutation Detection In Families For Next-Generation Sequencing Data
    Li, Bingshan
    Abecasis, Goncalo
    GENETIC EPIDEMIOLOGY, 2012, 36 (02) : 122 - 122
  • [14] Using Illumina next generation sequencing technologies to sequence multigene families in de novo species
    Hughes, Graham M.
    Gang, Li
    Murphy, William J.
    Higgins, Desmond G.
    Teeling, Emma C.
    MOLECULAR ECOLOGY RESOURCES, 2013, 13 (03) : 510 - 521
  • [15] An adaptive mobile cloud computing framework using a call graph based model
    Kaya, Mahir
    Kocyigit, Altan
    Eren, P. Erhan
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 65 : 12 - 35
  • [16] Interaction of Edge-Cloud Computing Based on SDN and NFV for Next Generation IoT
    Lv, Zhihan
    Xiu, Wenqun
    IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07) : 5706 - 5712
  • [17] Ring system-based chemical graph generation for de novo molecular design
    Miyao, Tomoyuki
    Kaneko, Hiromasa
    Funatsu, Kimito
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2016, 30 (05) : 425 - 446
  • [18] CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure
    Linheiro, Raquel
    Archer, John
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (11)
  • [19] Creating Next Generation Cloud Computing based Network Services and The Contributions of Social Cloud Operation Support System (OSS) to Society
    Sato, Miyuki
    2009 18TH IEEE INTERNATIONAL WORKSHOP ON ENABLING TECHNOLOGIES: INFRASTRUCTURES FOR COLLABORATIVE ENTERPRISES, 2009, : 52 - 56
  • [20] A biologist's guide to de novo genome assembly using next-generation sequence data: A test with fungal genomes
    Haridas, Sajeet
    Breuill, Colette
    Bohlmann, Joerg
    Hsiang, Tom
    JOURNAL OF MICROBIOLOGICAL METHODS, 2011, 86 (03) : 368 - 375