LoRDEC: accurate and efficient long read error correction

被引:526
|
作者
Salmela, Leena [1 ,2 ]
Rivals, Eric [3 ,4 ,5 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Helsinki Inst Informat Technol, FI-00014 Helsinki, Finland
[3] LIRMM, F-34095 Montpellier 5, France
[4] CNRS, Inst Biol Computat, F-34095 Montpellier 5, France
[5] Univ Montpellier, F-34095 Montpellier 5, France
基金
芬兰科学院;
关键词
BASIC LOCAL ALIGNMENT; GENOME ASSEMBLIES;
D O I
10.1093/bioinformatics/btu538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.
引用
收藏
页码:3506 / 3514
页数:9
相关论文
共 50 条
  • [41] SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing
    Daniel Danis
    Julius O. B. Jacobsen
    Parithi Balachandran
    Qihui Zhu
    Feyza Yilmaz
    Justin Reese
    Matthias Haimel
    Gholson J. Lyon
    Ingo Helbig
    Christopher J. Mungall
    Christine R. Beck
    Charles Lee
    Damian Smedley
    Peter N. Robinson
    Genome Medicine, 14
  • [42] Author Correction: SMURF-seq: efficient copy number profiling on long-read sequencers
    Rishvanth K. Prabakar
    Liya Xu
    James Hicks
    Andrew D. Smith
    Genome Biology, 21
  • [43] An Efficient Error Correction Coding Approach to Tolerate Soft Error
    Khan, Md. Mizanur Rahman
    Sadi, Muhammad Sheikh
    2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 500 - 505
  • [44] ACCURATE LONG READ MAPPING USING ENHANCED SUFFIX ARRAYS
    Vyverman, Michael
    De Schrijver, Joachim
    Van Criekinge, Wim
    Dawyndt, Peter
    Fack, Veerle
    BIOINFORMATICS 2011, 2011, : 102 - 107
  • [45] Sketching and sampling approaches for fast and accurate long read classification
    Das, Arun
    Schatz, Michael C.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [46] Sketching and sampling approaches for fast and accurate long read classification
    Arun Das
    Michael C. Schatz
    BMC Bioinformatics, 23
  • [47] ntEdit plus Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies
    Li, Janet X.
    Coombe, Lauren
    Wong, Johnathan
    Birol, Inanc
    Warren, Rene L.
    CURRENT PROTOCOLS, 2022, 2 (05):
  • [48] A Parallel Algorithm for Spectrum-based Short Read Error Correction
    Shah, Ankit R.
    Chockalingam, Sriram
    Aluru, Srinivas
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 60 - 70
  • [49] Accurate Robust and Efficient Error Estimation for Decision Trees
    Fan, Lixin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [50] An Accurate and Efficient Error Predictor Tool for CATR Measurements
    Cappellin, C.
    Sorensen, S. Busk
    Paquay, M.
    Ostergaard, A.
    PROCEEDINGS OF THE FOURTH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION, 2010,