Sequence assembly using next generation sequencing data —challenges and solutions

被引:0
|
作者
CHIN Francis Y.L.
LEUNG Henry C.M.
YIU S.M.
机构
[1] DepartmentofComputerScience,TheUniversityofHongKong
关键词
D O I
暂无
中图分类号
Q78 [基因工程(遗传工程)];
学科分类号
071007 ; 0836 ; 090102 ;
摘要
Sequence assembling is an important step for bioinformatics study.With the help of next generation sequencing(NGS)technology,high throughput DNA fragment(reads)can be randomly sampled from DNA or RNA molecular sequence.However,as the positions of reads being sampled are unknown,assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence.Compared with traditional Sanger sequencing methods,although the throughput of NGS reads increases,the read length is shorter and the error rate is higher.It introduces several problems in assembling.Moreover,paired-end reads instead of single-end reads can be sampled which contain more information.The existing assemblers cannot fully utilize this information and fails to assemble longer contigs.In this article,we will revisit the major problems of assembling NGS reads on genomic,transcriptomic,metagenomic and metatranscriptomic data.We will also describe our IDBA package for solving these problems.IDBA package has adopted several novel ideas in assembling,including using multiple k,local assembling and progressive depth removal.Compared with existence assemblers,IDBA has better performance on many simulated and real sequencing datasets.
引用
收藏
页码:1140 / 1148
页数:9
相关论文
共 50 条
  • [1] Sequence assembly using next generation sequencing data—challenges and solutions
    Francis Y. L. Chin
    Henry C. M. Leung
    S. M. Yiu
    Science China Life Sciences, 2014, 57 : 1140 - 1148
  • [2] Sequence assembly using next generation sequencing data —challenges and solutions
    CHIN Francis Y.L.
    LEUNG Henry C.M.
    YIU S.M.
    Science China(Life Sciences) , 2014, (11) : 1140 - 1148
  • [3] Sequence assembly using next generation sequencing data-challenges and solutions
    Chin, Francis Y. L.
    Leung, Henry C. M.
    Yiu, S. M.
    SCIENCE CHINA-LIFE SCIENCES, 2014, 57 (11) : 1140 - 1148
  • [4] Data Management Challenges in Next Generation Sequencing
    Wandelt, Sebastian
    Rheinländer, Astrid
    Bux, Marc
    Thalheim, Lisa
    Haldemann, Berit
    Leser, Ulf
    Datenbank-Spektrum, 2012, 12 (03) : 161 - 171
  • [5] Assembly of repetitive regions using next-generation sequencing data
    Nowak, Robert M.
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2015, 35 (04) : 276 - 283
  • [6] A Clustering Approach for DeNovo Assembly using Next Generation Sequencing Data
    Kchouk, Mehdi
    Elloumi, Mourad
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1909 - 1911
  • [7] Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
    Mende, Daniel R.
    Waller, Alison S.
    Sunagawa, Shinichi
    Jaervelin, Aino I.
    Chan, Michelle M.
    Arumugam, Manimozhiyan
    Raes, Jeroen
    Bork, Peer
    PLOS ONE, 2012, 7 (02):
  • [8] Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
    El-Metwally, Sara
    Hamza, Taher
    Zakaria, Magdi
    Helmy, Mohamed
    PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (12)
  • [9] Assembly algorithms for next-generation sequencing data
    Miller, Jason R.
    Koren, Sergey
    Sutton, Granger
    GENOMICS, 2010, 95 (06) : 315 - 327
  • [10] Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing
    Nagarajan, Niranjan
    Pop, Mihai
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (07) : 897 - 908