Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

被引:79
|
作者
El-Metwally, Sara [1 ]
Hamza, Taher [1 ]
Zakaria, Magdi [1 ]
Helmy, Mohamed [2 ,3 ]
机构
[1] Mansoura Univ, Dept Comp Sci, Fac Comp & Informat, Mansoura, Egypt
[2] Al Azhar Univ, Dept Bot, Fac Agr, Cairo, Egypt
[3] Al Azhar Univ, Fac Agr, Dept Biotechnol, Cairo, Egypt
关键词
READ ERROR-CORRECTION; SHORT DNA-SEQUENCES; DE-BRUIJN GRAPHS; GENOME SEQUENCE; STRING GRAPH; PAIRED READS; ALGORITHM; TECHNOLOGIES; VELVET; PLATFORMS;
D O I
10.1371/journal.pcbi.1003345
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Next-generation sequence analysis
    H Craig Mak
    Nature Biotechnology, 2011, 29 (1) : 45 - 46
  • [22] Erratum: Repetitive DNA and next-generation sequencing: computational challenges and solutions
    Todd J. Treangen
    Steven L. Salzberg
    Nature Reviews Genetics, 2012, 13 : 146 - 146
  • [23] Next-Generation Sequencing Challenges
    Baker S.C.
    2017, Mary Ann Liebert Inc. (37): : 1and14 - 15
  • [24] Assembly of repetitive regions using next-generation sequencing data
    Nowak, Robert M.
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2015, 35 (04) : 276 - 283
  • [25] Bioinformatics tools and databases for analysis of next-generation sequence data
    Lee, Hong C.
    Lai, Kaitao
    Lorenc, Michal Tadeusz
    Imelfort, Michael
    Duran, Chris
    Edwards, David
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2012, 11 (01) : 12 - 24
  • [26] ArchTEx: accurate extraction and visualization of next-generation sequence data
    Lai, William K. M.
    Bard, Jonathan E.
    Buck, Michael J.
    BIOINFORMATICS, 2012, 28 (07) : 1021 - 1023
  • [27] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    BIOINFORMATICS, 2023, 39 (01)
  • [28] Analysis of next-generation genomic data in cancer: accomplishments and challenges
    Ding, Li
    Wendl, Michael C.
    Koboldt, Daniel C.
    Mardis, Elaine R.
    HUMAN MOLECULAR GENETICS, 2010, 19 : R188 - R196
  • [29] Next-generation technologies in computational chemistry
    Fourches, Denis
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2015, 250
  • [30] PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
    Ranjan Kumar Maji
    Arijita Sarkar
    Sunirmal Khatua
    Subhasis Dasgupta
    Zhumur Ghosh
    BMC Bioinformatics, 15