Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

被引:79
|
作者
El-Metwally, Sara [1 ]
Hamza, Taher [1 ]
Zakaria, Magdi [1 ]
Helmy, Mohamed [2 ,3 ]
机构
[1] Mansoura Univ, Dept Comp Sci, Fac Comp & Informat, Mansoura, Egypt
[2] Al Azhar Univ, Dept Bot, Fac Agr, Cairo, Egypt
[3] Al Azhar Univ, Fac Agr, Dept Biotechnol, Cairo, Egypt
关键词
READ ERROR-CORRECTION; SHORT DNA-SEQUENCES; DE-BRUIJN GRAPHS; GENOME SEQUENCE; STRING GRAPH; PAIRED READS; ALGORITHM; TECHNOLOGIES; VELVET; PLATFORMS;
D O I
10.1371/journal.pcbi.1003345
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Next-generation technology: challenges and applications
    Rudall, B. H.
    Mann, C. J. H.
    KYBERNETES, 2009, 38 (09) : 1426 - 1432
  • [42] Evaluation of variant detection software for pooled next-generation sequence data
    Huang, Howard W.
    Mullikin, James C.
    Hansen, Nancy F.
    BMC BIOINFORMATICS, 2015, 16
  • [43] A population model for genotyping indels from next-generation sequence data
    Shao, Haojing
    Bellos, Evangelos
    Yin, Hanjiudai
    Liu, Xiao
    Zou, Jing
    Li, Yingrui
    Wang, Jun
    Coin, Lachlan J. M.
    NUCLEIC ACIDS RESEARCH, 2013, 41 (03)
  • [44] Using GBrowse 2.0 to visualize and share next-generation sequence data
    Stein, Lincoln D.
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (02) : 162 - 171
  • [45] Evaluation of variant detection software for pooled next-generation sequence data
    Howard W. Huang
    James C. Mullikin
    Nancy F. Hansen
    BMC Bioinformatics, 16
  • [46] Advances in the Use of Next-Generation Sequence Data in Plant Systematics and Evolution
    Soltis, D. E.
    Burleigh, G.
    Barbazuk, W. B.
    Moore, M. J.
    Soltis, P. S.
    INTERNATIONAL SYMPOSIUM ON MOLECULAR MARKERS IN HORTICULTURE, 2010, 859 : 193 - 206
  • [47] Internet-Based Solutions for Analysis of Next-Generation Sequence Data
    Larsen, Mette Voldby
    JOURNAL OF CLINICAL MICROBIOLOGY, 2013, 51 (09) : 3162 - 3162
  • [48] Big Data Challenges in Climate Science Improving the next-generation cyberinfrastructure
    Schnase, John L.
    Lee, Tsengdar J.
    Mattmann, Chris A.
    Lynnes, Christopher S.
    Cinquini, Luca
    Ramirez, Paul M.
    Hart, Andre F.
    Williams, Dean N.
    Waliser, Duane
    Rinsland, Pamela
    Webster, W. Philip
    Duffy, Daniel Q.
    Mcinerney, Mark A.
    Tamkin, Glenn S.
    Potter, Gerald L.
    Carrier, Laura
    IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2016, 4 (03): : 10 - 22
  • [49] Challenges in clinical interpretation of next-generation sequencing data: Advantages and Pitfalls
    Karakoyun, Hilal Keskin
    Sayar, Ceyhan
    Yararbas, Kanay
    RESULTS IN ENGINEERING, 2023, 20
  • [50] Analysis of next-generation sequencing data for cancer genomes: challenges and pitfalls
    Wang, Jianmin
    Chen, Xiang
    Wu, Gang
    Rusch, Michael C.
    Parker, Matthew
    Lei, Wei
    Downing, James R.
    Zhang, Jinghui
    CANCER RESEARCH, 2012, 72