LoRDEC: accurate and efficient long read error correction

被引:526
|
作者
Salmela, Leena [1 ,2 ]
Rivals, Eric [3 ,4 ,5 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Helsinki Inst Informat Technol, FI-00014 Helsinki, Finland
[3] LIRMM, F-34095 Montpellier 5, France
[4] CNRS, Inst Biol Computat, F-34095 Montpellier 5, France
[5] Univ Montpellier, F-34095 Montpellier 5, France
基金
芬兰科学院;
关键词
BASIC LOCAL ALIGNMENT; GENOME ASSEMBLIES;
D O I
10.1093/bioinformatics/btu538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.
引用
收藏
页码:3506 / 3514
页数:9
相关论文
共 50 条
  • [21] Reptile: representative tiling for short read error correction
    Yang, Xiao
    Dorman, Karin S.
    Aluru, Srinivas
    BIOINFORMATICS, 2010, 26 (20) : 2526 - 2533
  • [22] Fiona: a parallel and automatic strategy for read error correction
    Schulz, Marcel H.
    Weese, David
    Holtgrewe, Manuel
    Dimitrova, Viktoria
    Niu, Sijia
    Reinert, Knut
    Richard, Hugues
    BIOINFORMATICS, 2014, 30 (17) : I356 - I363
  • [23] SHREC: a short-read error correction method
    Schroeder, Jan
    Schroeder, Heiko
    Puglisi, Simon J.
    Sinha, Ranjan
    Schmidt, Bertil
    BIOINFORMATICS, 2009, 25 (17) : 2157 - 2163
  • [24] MAECI: A pipeline for generating consensus sequence with nanopore sequencing long-read assembly and error correction
    Lang, Jidong
    PLOS ONE, 2022, 17 (05):
  • [25] SURFACE CODE QUANTUM ERROR CORRECTION INCORPORATING ACCURATE ERROR PROPAGATION
    Fowler, Austin G.
    Wang, David S.
    Hollenberg, Lloyd C. L.
    QUANTUM INFORMATION & COMPUTATION, 2011, 11 (1-2) : 8 - 18
  • [26] Efficient and Error-Tolerant Sequencing Read Mapping
    Jaroszynski, Piotr
    Dojer, Norbert
    PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, 2013, : 353 - 364
  • [27] Efficient and Error-Tolerant Sequencing Read Mapping
    Jaroszynski, Piotr
    Dojer, Norbert
    CURRENT BIOINFORMATICS, 2015, 10 (02) : 191 - 198
  • [28] Benchmarking of long-read correction methods
    Dohm, Juliane C.
    Peters, Philipp
    Stralis-Pavese, Nancy
    Himmelbauer, Heinz
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
  • [29] Short Read Error Correction using an FM-Index
    Greenstein, Seth
    Holt, James
    McMillan, Leonard
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 101 - 104
  • [30] BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper
    Guidi, Giulia
    Ellis, Marquita
    Rokhsar, Daniel
    Yelick, Katherine
    Buluc, Aydin
    PROCEEDINGS OF THE 2021 SIAM CONFERENCE ON APPLIED AND COMPUTATIONAL DISCRETE ALGORITHMS, ACDA21, 2021, : 123 - 134