Sequence assembly using next generation sequencing data —challenges and solutions

被引:0
|
作者
CHIN Francis Y.L.
LEUNG Henry C.M.
YIU S.M.
机构
[1] DepartmentofComputerScience,TheUniversityofHongKong
关键词
D O I
暂无
中图分类号
Q78 [基因工程(遗传工程)];
学科分类号
071007 ; 0836 ; 090102 ;
摘要
Sequence assembling is an important step for bioinformatics study.With the help of next generation sequencing(NGS)technology,high throughput DNA fragment(reads)can be randomly sampled from DNA or RNA molecular sequence.However,as the positions of reads being sampled are unknown,assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence.Compared with traditional Sanger sequencing methods,although the throughput of NGS reads increases,the read length is shorter and the error rate is higher.It introduces several problems in assembling.Moreover,paired-end reads instead of single-end reads can be sampled which contain more information.The existing assemblers cannot fully utilize this information and fails to assemble longer contigs.In this article,we will revisit the major problems of assembling NGS reads on genomic,transcriptomic,metagenomic and metatranscriptomic data.We will also describe our IDBA package for solving these problems.IDBA package has adopted several novel ideas in assembling,including using multiple k,local assembling and progressive depth removal.Compared with existence assemblers,IDBA has better performance on many simulated and real sequencing datasets.
引用
收藏
页码:1140 / 1148
页数:9
相关论文
共 50 条
  • [41] Bioinformatics for Next Generation Sequencing Data
    Magi, Alberto
    Benelli, Matteo
    Gozzini, Alessia
    Girolami, Francesca
    Torricelli, Francesca
    Brandi, Maria Luisa
    GENES, 2010, 1 (02) : 294 - 307
  • [42] Visualization of next generation sequencing data
    An, Jiyuan
    Lai, John
    Wang, Chenwei
    Tevz, Gregor
    Lehman, Melanie L.
    Nelson, Colleen C.
    BJU INTERNATIONAL, 2015, 116 : 35 - 36
  • [43] Analytical challenges in next generation packaging/assembly
    Dias, R
    Goyal, D
    Tandon, S
    Samuelson, G
    CHARACTERIZATION AND METROLOGY FOR ULSI TECHNOLOGY, 1998, 449 : 591 - 597
  • [44] ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data
    Margarido, Gabriel R. A.
    Heckerman, David
    PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (04)
  • [45] A Tolerance Graph Approach for Domain-Specific Assembly of Next Generation Sequencing Data
    Warnke, Julia
    Ali, Hesham
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 88 - 95
  • [46] Optimization of de novo transcriptome assembly from next-generation sequencing data
    Surget-Groba, Yann
    Montoya-Burgos, Juan I.
    GENOME RESEARCH, 2010, 20 (10) : 1432 - 1440
  • [47] Sequence analysis and secondary structure prediction of autosomal STR alleles using next generation sequencing (NGS) data
    Dash, Hirak Ranjan
    Ranga, Akash
    HUMAN GENE, 2024, 40
  • [48] Next-generation diagnostics: Eliminating the excessive sequence processing associated with next-generation sequencing using EDNA
    Schneider, W. L.
    Stobbe, A. H.
    Daniels, J.
    Espindola, A. S.
    Verma, R.
    Blagden, T.
    Fletcher, J.
    Ochoa-Corona, F.
    Garzon, C.
    Hoyt, P. R.
    Melcher, U.
    PHYTOPATHOLOGY, 2012, 102 (07) : 155 - 155
  • [49] Internet-Based Solutions for Analysis of Next-Generation Sequence Data
    Larsen, Mette Voldby
    JOURNAL OF CLINICAL MICROBIOLOGY, 2013, 51 (09) : 3162 - 3162
  • [50] Sequence Comparative Analysis Using Networks: Software for Evaluating De Novo Transcript Assembly from Next-Generation Sequencing
    Misner, Ian
    Bicep, Cedric
    Lopez, Philippe
    Halary, Sebastien
    Bapteste, Eric
    Lane, Christopher E.
    MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (08) : 1975 - 1986