LoRDEC: accurate and efficient long read error correction

被引:526
|
作者
Salmela, Leena [1 ,2 ]
Rivals, Eric [3 ,4 ,5 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Helsinki Inst Informat Technol, FI-00014 Helsinki, Finland
[3] LIRMM, F-34095 Montpellier 5, France
[4] CNRS, Inst Biol Computat, F-34095 Montpellier 5, France
[5] Univ Montpellier, F-34095 Montpellier 5, France
基金
芬兰科学院;
关键词
BASIC LOCAL ALIGNMENT; GENOME ASSEMBLIES;
D O I
10.1093/bioinformatics/btu538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.
引用
收藏
页码:3506 / 3514
页数:9
相关论文
共 50 条
  • [1] NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads
    Hu, Jiang
    Wang, Zhuo
    Sun, Zongyi
    Hu, Benxia
    Ayoola, Adeola Oluwakemi
    Liang, Fan
    Li, Jingjing
    Sandoval, Jose R.
    Cooper, David N.
    Ye, Kai
    Ruan, Jue
    Xiao, Chuan-Le
    Wang, Depeng
    Wu, Dong-Dong
    Wang, Sheng
    GENOME BIOLOGY, 2024, 25 (01)
  • [2] A comprehensive evaluation of long read error correction methods
    Haowen Zhang
    Chirag Jain
    Srinivas Aluru
    BMC Genomics, 21
  • [3] A comprehensive evaluation of long read error correction methods
    Zhang, Haowen
    Jain, Chirag
    Aluru, Srinivas
    BMC GENOMICS, 2020, 21 (Suppl 6)
  • [4] ParLECH: Parallel Long-Read Error Correction with Hadoop
    Das, Arghya Kusum
    Lee, Kisung
    Park, Seung-Jong
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 341 - 348
  • [5] HALC: High throughput algorithm for long read error correction
    Ergude Bao
    Lingxiao Lan
    BMC Bioinformatics, 18
  • [6] HALC: High throughput algorithm for long read error correction
    Bao, Ergude
    Lan, Lingxiao
    BMC BIOINFORMATICS, 2017, 18
  • [7] A Long read hybrid error correction algorithm based on segmented pHMM
    Hu Lanyue
    Chen Jianhua
    Wang Rongshu
    Lu Zhiwen
    Hou Bin
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1501 - 1504
  • [8] Evaluation of haplotype-aware long-read error correction with hifieval
    Guo, Yujie
    Feng, Xiaowen
    Li, Heng
    BIOINFORMATICS, 2023, 39 (10)
  • [9] FMLRC: Hybrid long read error correction using an FM-index
    Jeremy R. Wang
    James Holt
    Leonard McMillan
    Corbin D. Jones
    BMC Bioinformatics, 19
  • [10] FMLRC: Hybrid long read error correction using an FM-index
    Wang, Jeremy R.
    Holt, James
    McMillan, Leonard
    Jones, Corbin D.
    BMC BIOINFORMATICS, 2018, 19