Evaluating techniques for metagenome annotation using simulated sequence data

被引:46
|
作者
Randle-Boggis, Richard J. [1 ]
Helgason, Thorunn [1 ]
Sapp, Melanie [2 ]
Ashton, Peter D. [1 ]
机构
[1] Univ York, Dept Biol, York YO10 5DD, N Yorkshire, England
[2] Fera Sci Ltd, York YO41 1LZ, N Yorkshire, England
关键词
DNA sequencing; metagenomics; metagenome analysis; microbial ecology; sequence annotation; MICROBIAL DIVERSITY; PROTEIN; IDENTIFICATION; SERVER; TOOL;
D O I
10.1093/femsec/fiw095
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naive choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Evaluating user interfaces using techniques from qualitative data analysis
    Kemp, EA
    Ots, S
    1998 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: EDUCATION & PRACTICE, PROCEEDINGS, 1998, : 222 - 229
  • [32] Metagenome sequence data mining for viral interaction studies: Review on progress and prospects
    Rahimian, Mohammadreza
    Panahi, Bahman
    VIRUS RESEARCH, 2024, 349
  • [33] Brain Source Localization Techniques: Evaluation Study Using Simulated EEG Data
    Hyder, Rasha
    Kamel, Nidal
    Tang, Tong Boon
    Bornot, Jose
    2014 IEEE CONFERENCE ON BIOMEDICAL ENGINEERING AND SCIENCES (IECBES), 2014, : 942 - 947
  • [34] Get the most out of your metagenome: computational analysis of environmental sequence data
    Raes, Jeroen
    Foerstner, Konrad Ulrich
    Bork, Peer
    CURRENT OPINION IN MICROBIOLOGY, 2007, 10 (05) : 490 - 498
  • [35] Micro Sequence Identification of DNA Data Using Pattern Mining Techniques
    Surendar, A.
    Shaik, Sadulla
    Rani, N. Usha Rani
    MATERIALS TODAY-PROCEEDINGS, 2018, 5 (01) : 578 - 587
  • [36] Maximizing the value of sequence data - Annotation critical, says Branscomb
    Branscomb, E
    HUMAN GENOME NEWS, 1996, 8 (02) : 2 - 2
  • [37] GOblet: a platform for Gene Ontology annotation of anonymous sequence data
    Groth, D
    Lehrach, H
    Hennig, S
    NUCLEIC ACIDS RESEARCH, 2004, 32 : W313 - W317
  • [38] Sequence alignment using simulated annealing
    Sariyer, Ozan S.
    Guven, Can
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2010, 389 (15) : 3007 - 3012
  • [39] Evaluating the Utility of Coarsened Exact Matching for Pharmacoepidemiology Using Real and Simulated Claims Data
    Ripollone, John E.
    Huybrechts, Krista F.
    Rothman, Kenneth J.
    Ferguson, Ryan E.
    Franklin, Jessica M.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2020, 189 (06) : 613 - 622
  • [40] Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data
    Pignatelli, Miguel
    Moya, Andres
    PLOS ONE, 2011, 6 (05):