Evaluating techniques for metagenome annotation using simulated sequence data

被引:46
|
作者
Randle-Boggis, Richard J. [1 ]
Helgason, Thorunn [1 ]
Sapp, Melanie [2 ]
Ashton, Peter D. [1 ]
机构
[1] Univ York, Dept Biol, York YO10 5DD, N Yorkshire, England
[2] Fera Sci Ltd, York YO41 1LZ, N Yorkshire, England
关键词
DNA sequencing; metagenomics; metagenome analysis; microbial ecology; sequence annotation; MICROBIAL DIVERSITY; PROTEIN; IDENTIFICATION; SERVER; TOOL;
D O I
10.1093/femsec/fiw095
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naive choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data
    Kieser, Silas
    Brown, Joseph
    Zdobnov, Evgeny M.
    Trajkovski, Mirko
    McCue, Lee Ann
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [2] ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data
    Silas Kieser
    Joseph Brown
    Evgeny M. Zdobnov
    Mirko Trajkovski
    Lee Ann McCue
    BMC Bioinformatics, 21
  • [3] Protein structure determination using metagenome sequence data
    Ovchinnikov, Sergey
    Park, Hahnbeom
    Varghese, Neha
    Huang, Po-Ssu
    Pavlopoulos, Georgios A.
    Kim, David E.
    Kamisetty, Hetunandan
    Kyrpides, Nikos C.
    Baker, David
    SCIENCE, 2017, 355 (6322) : 294 - 297
  • [4] Annotation of metagenome short reads using proxygenes
    Dalevi, Daniel
    Ivanova, Natalia N.
    Mavromatis, Konstantinos
    Hooper, Sean D.
    Szeto, Ernest
    Hugenholtz, Philip
    Kyrpides, Nikos C.
    Markowitz, Victor M.
    BIOINFORMATICS, 2008, 24 (16) : I7 - I13
  • [5] Metagenome Annotation Using a Distributed Grid of Undergraduate Students
    Hingamp, Pascal
    Brochier, Celine
    Talla, Emmanuel
    Gautheret, Daniel
    Thieffry, Denis
    Herrmann, Carl
    PLOS BIOLOGY, 2008, 6 (11) : 2362 - 2367
  • [6] Protein contact prediction using metagenome sequence data and residual neural networks
    Wu, Qi
    Peng, Zhenling
    Anishchenko, Ivan
    Cong, Qian
    Baker, David
    Yang, Jianyi
    BIOINFORMATICS, 2020, 36 (01) : 41 - 48
  • [7] Graph-based sequence annotation using a data integration approach
    Pesch, Robert
    Lysenko, Artem
    Hindle, Matthew
    Hassani-Pak, Keywan
    Thiele, Ralf
    Rawlings, Christopher
    Koehler, Jacob
    Taubert, Jan
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2008, 5 (02)
  • [8] TheViral MetaGenome Annotation Pipeline (VMGAP): An automated tool for the functional annotation of viral Metagenomic shotgun sequencing data
    Lorenzi, Hernan A.
    Hoover, Jeff
    Inman, Jason
    Safford, Todd
    Murphy, Sean
    Kagan, Leonid
    Williamson, Shannon J.
    STANDARDS IN GENOMIC SCIENCES, 2011, 4 (03): : 418 - 429
  • [9] Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
    Gilchrist, Michael J.
    Christensen, Mikkel B.
    Harland, Richard
    Pollet, Nicolas
    Smith, James C.
    Ueno, Naoto
    Papalopulu, Nancy
    BMC BIOINFORMATICS, 2008, 9 (1) : 442
  • [10] Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
    Michael J Gilchrist
    Mikkel B Christensen
    Richard Harland
    Nicolas Pollet
    James C Smith
    Naoto Ueno
    Nancy Papalopulu
    BMC Bioinformatics, 9