Evaluating techniques for metagenome annotation using simulated sequence data

被引:46
|
作者
Randle-Boggis, Richard J. [1 ]
Helgason, Thorunn [1 ]
Sapp, Melanie [2 ]
Ashton, Peter D. [1 ]
机构
[1] Univ York, Dept Biol, York YO10 5DD, N Yorkshire, England
[2] Fera Sci Ltd, York YO41 1LZ, N Yorkshire, England
关键词
DNA sequencing; metagenomics; metagenome analysis; microbial ecology; sequence annotation; MICROBIAL DIVERSITY; PROTEIN; IDENTIFICATION; SERVER; TOOL;
D O I
10.1093/femsec/fiw095
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naive choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Evaluation of annotation strategies using an entire genome sequence
    Iliopoulos, I
    Tsoka, S
    Andrade, MA
    Enright, AJ
    Carroll, M
    Poullet, P
    Promponas, V
    Liakopoulos, T
    Palaios, G
    Pasquier, C
    Hamodrakas, S
    Tamames, J
    Yagnik, AT
    Tramontano, A
    Devos, D
    Blaschke, C
    Valencia, A
    Brett, D
    Martin, D
    Leroy, C
    Rigoutsos, I
    Sander, C
    Ouzounis, CA
    BIOINFORMATICS, 2003, 19 (06) : 717 - 726
  • [42] Sequence data mining techniques and applications
    Sarawagi, S
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 800 - 800
  • [43] DATA ANNOTATION TECHNIQUES FOR FILM AND ELECTRO-OPTIC SENSORS
    TRONE, DL
    PROCEEDINGS OF THE SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS, 1983, 424 : 100 - 103
  • [44] MEGGASENSE - The Metagenome/Genome Annotated Sequence Natural Language Search Engine: A Platform for the Construction of Sequence Data Warehouses
    Gacesa, Ranko
    Zucko, Jurica
    Petursdottir, Solveig K.
    Gudmundsdottir, Elisabet Eik
    Fridjonsson, Olafur H.
    Diminic, Janko
    Long, Paul F.
    Cullum, John
    Hranueli, Daslav
    Hreggvidsson, Gudmundur O.
    Starcevic, Antonio
    FOOD TECHNOLOGY AND BIOTECHNOLOGY, 2017, 55 (02) : 251 - 257
  • [45] SEQUENCE ANALYSIS OF MICROBIAL COMMUNITY INTEGRATING METAGENOME SEQUENCE DATA OBTAINED FROM POA ALPIGENA GRASSLAND IN THE SANJIANGYUAN
    Fan, Ping
    Zhang, Yaou
    Zhang, Ruiqiang
    PAKISTAN JOURNAL OF BOTANY, 2018, 50 (01) : 381 - 387
  • [46] Video Annotation for Immersive Journalism using Masking Techniques
    Meira, Joao
    Marques, Joao
    Jacob, Joao
    Nobrega, Rui
    Rodrigues, Rui
    Coelho, Antonio
    Augusto de Sousa, A.
    2016 23RD PORTUGUESE MEETING ON COMPUTER GRAPHICS AND INTERACTION (EPCGI), 2016, : 49 - 55
  • [47] EVALUATING DATA QUALITY IN LARGE DATABASES USING PATTERN-RECOGNITION TECHNIQUES
    MEGLEN, RR
    SISTKO, RJ
    ACS SYMPOSIUM SERIES, 1985, 292 : 16 - 33
  • [48] Medical Image Annotation and Retrieval by Using Classification Techniques
    Abdulrazzaq, M. M.
    Mohd, Shahrul Azman
    Fadhil, Muayad A.
    3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES ACSAT 2014, 2014, : 32 - 36
  • [49] Video annotation for immersive journalism using masking techniques
    Meira, João
    Marques, João
    Jacob, João
    Nóbrega, Rui
    Rodrigues, Rui
    Coelho, António
    De Sousa, A. Augusto
    2016 23 Encontro Portugues de Computacao Grafica e Interacao, EPCGI 2016, 2017,
  • [50] FORECASTING GRAIN-SORGHUM YIELDS USING SIMULATED WEATHER DATA AND UPDATING TECHNIQUES
    ARKIN, GF
    MAAS, SJ
    RICHARDSON, CW
    TRANSACTIONS OF THE ASAE, 1980, 23 (03): : 676 - 680