Generation and application of pseudo-long reads for metagenome assembly

被引:0
|
作者
Sim, Mikang [1 ]
Lee, Jongin [1 ]
Wy, Suyeon [1 ]
Park, Nayoung [1 ]
Lee, Daehwan [1 ]
Kwon, Daehong [1 ]
kim, Jaebum [1 ]
机构
[1] Konkuk Univ, Dept Biomed Sci & Engn, 120 Neungdong Ro, Seoul 05029, South Korea
来源
GIGASCIENCE | 2022年 / 11卷
关键词
next-generation sequencing; metagenomic assembly; pseudo-long read;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Metagenomic assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenomic assembly, especially when only short reads are available, is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and have begun to be used for metagenomic assembly, many metagenomic studies have been performed based on short reads because the generation of long reads requires higher sequencing cost than short reads. Results In this study, we present a new method called PLR-GEN. It creates pseudo-long reads from metagenomic short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community data set in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4% error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenomic assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes. Conclusions PLR-GEN can be used to generate artificial long read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Enhancing Long-Read-Based Strain-Aware Metagenome Assembly
    Luo, Xiao
    Kang, Xiongbin
    Schoenhuth, Alexander
    FRONTIERS IN GENETICS, 2022, 13
  • [42] metaFlye: scalable long-read metagenome assembly using repeat graphs
    Mikhail Kolmogorov
    Derek M. Bickhart
    Bahar Behsaz
    Alexey Gurevich
    Mikhail Rayko
    Sung Bong Shin
    Kristen Kuhn
    Jeffrey Yuan
    Evgeny Polevikov
    Timothy P. L. Smith
    Pavel A. Pevzner
    Nature Methods, 2020, 17 : 1103 - 1110
  • [43] metaFlye: scalable long-read metagenome assembly using repeat graphs
    Kolmogorov, Mikhail
    Bickhart, Derek M.
    Behsaz, Bahar
    Gurevich, Alexey
    Rayko, Mikhail
    Shin, Sung Bong
    Kuhn, Kristen
    Yuan, Jeffrey
    Polevikov, Evgeny
    Smith, Timothy P. L.
    Pevzner, Pavel A.
    NATURE METHODS, 2020, 17 (11) : 1103 - +
  • [44] DENTIST-using long reads for closing assembly gaps at high accuracy
    Ludwig, Arne
    Pippel, Martin
    Myers, Gene
    Hiller, Michael
    GIGASCIENCE, 2022, 11
  • [45] Assembly of long error-prone reads using de Bruijn graphs
    Lin, Yu
    Yuan, Jeffrey
    Kolmogorov, Mikhail
    Shen, Max W.
    Chaisson, Mark
    Pevzner, Pavel A.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (52) : E8396 - E8405
  • [46] HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads
    Stefano Beretta
    Murray D. Patterson
    Simone Zaccaria
    Gianluca Della Vedova
    Paola Bonizzoni
    BMC Bioinformatics, 19
  • [47] Hybrid assembly with long and short reads improves discovery of gene family expansions
    Jason R. Miller
    Peng Zhou
    Joann Mudge
    James Gurtowski
    Hayan Lee
    Thiruvarangan Ramaraj
    Brian P. Walenz
    Junqi Liu
    Robert M. Stupar
    Roxanne Denny
    Li Song
    Namrata Singh
    Lyza G. Maron
    Susan R. McCouch
    W. Richard McCombie
    Michael C. Schatz
    Peter Tiffin
    Nevin D. Young
    Kevin A. T. Silverstein
    BMC Genomics, 18
  • [48] HapCol: accurate and memory-efficient haplotype assembly from long reads
    Pirola, Yuri
    Zaccaria, Simone
    Dondi, Riccardo
    Klau, Gunnar W.
    Pisanti, Nadia
    Bonizzoni, Paola
    BIOINFORMATICS, 2016, 32 (11) : 1610 - 1617
  • [49] Assembly-free discovery of human novel sequences using long reads
    Li, Qiuhui
    Yan, Bin
    Lam, Tak-Wah
    Luo, Ruibang
    DNA RESEARCH, 2022, 29 (06)
  • [50] NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads
    Hu, Jiang
    Wang, Zhuo
    Sun, Zongyi
    Hu, Benxia
    Ayoola, Adeola Oluwakemi
    Liang, Fan
    Li, Jingjing
    Sandoval, Jose R.
    Cooper, David N.
    Ye, Kai
    Ruan, Jue
    Xiao, Chuan-Le
    Wang, Depeng
    Wu, Dong-Dong
    Wang, Sheng
    GENOME BIOLOGY, 2024, 25 (01)