Evaluating techniques for metagenome annotation using simulated sequence data

被引：46

作者：

Randle-Boggis, Richard J. ^{[1
]}

Helgason, Thorunn ^{[1
]}

Sapp, Melanie ^{[2
]}

Ashton, Peter D. ^{[1
]}

机构：

[1] Univ York, Dept Biol, York YO10 5DD, N Yorkshire, England

[2] Fera Sci Ltd, York YO41 1LZ, N Yorkshire, England

来源：

FEMS MICROBIOLOGY ECOLOGY | 2016年 / 92卷 / 07期

关键词：

DNA sequencing; metagenomics; metagenome analysis; microbial ecology; sequence annotation; MICROBIAL DIVERSITY; PROTEIN; IDENTIFICATION; SERVER; TOOL;

D O I：

10.1093/femsec/fiw095

中图分类号：

Q93 [微生物学];

学科分类号：

071005 ; 100705 ;

摘要：

The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naive choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.

引用

页数：15

共 50 条

[1] ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data
Kieser, Silas
Brown, Joseph
Zdobnov, Evgeny M.
Trajkovski, Mirko
McCue, Lee Ann
BMC BIOINFORMATICS, 2020, 21 (01)
[2] ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data
Silas Kieser
Joseph Brown
Evgeny M. Zdobnov
Mirko Trajkovski
Lee Ann McCue
BMC Bioinformatics, 21
[3] Protein structure determination using metagenome sequence data
Ovchinnikov, Sergey
Park, Hahnbeom
Varghese, Neha
Huang, Po-Ssu
Pavlopoulos, Georgios A.
Kim, David E.
Kamisetty, Hetunandan
Kyrpides, Nikos C.
Baker, David
SCIENCE, 2017, 355 (6322) : 294 - 297
[4] Annotation of metagenome short reads using proxygenes
Dalevi, Daniel
Ivanova, Natalia N.
Mavromatis, Konstantinos
Hooper, Sean D.
Szeto, Ernest
Hugenholtz, Philip
Kyrpides, Nikos C.
Markowitz, Victor M.
BIOINFORMATICS, 2008, 24 (16) : I7 - I13
[5] Metagenome Annotation Using a Distributed Grid of Undergraduate Students
Hingamp, Pascal
Brochier, Celine
Talla, Emmanuel
Gautheret, Daniel
Thieffry, Denis
Herrmann, Carl
PLOS BIOLOGY, 2008, 6 (11) : 2362 - 2367
[6] Protein contact prediction using metagenome sequence data and residual neural networks
Wu, Qi
Peng, Zhenling
Anishchenko, Ivan
Cong, Qian
Baker, David
Yang, Jianyi
BIOINFORMATICS, 2020, 36 (01) : 41 - 48
[7] Graph-based sequence annotation using a data integration approach
Pesch, Robert
Lysenko, Artem
Hindle, Matthew
Hassani-Pak, Keywan
Thiele, Ralf
Rawlings, Christopher
Koehler, Jacob
Taubert, Jan
JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2008, 5 (02)
[8] TheViral MetaGenome Annotation Pipeline (VMGAP): An automated tool for the functional annotation of viral Metagenomic shotgun sequencing data
Lorenzi, Hernan A.
Hoover, Jeff
Inman, Jason
Safford, Todd
Murphy, Sean
Kagan, Leonid
Williamson, Shannon J.
STANDARDS IN GENOMIC SCIENCES, 2011, 4 (03): : 418 - 429
[9] Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
Gilchrist, Michael J.
Christensen, Mikkel B.
Harland, Richard
Pollet, Nicolas
Smith, James C.
Ueno, Naoto
Papalopulu, Nancy
BMC BIOINFORMATICS, 2008, 9 (1) : 442
[10] Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data
Michael J Gilchrist
Mikkel B Christensen
Richard Harland
Nicolas Pollet
James C Smith
Naoto Ueno
Nancy Papalopulu
BMC Bioinformatics, 9

← 1 2 3 4 5 →