Metagenome Assembly Validation: Which Metagenome Contigs are Bona Fide?

被引:0
|
作者
Ji, Yan [1 ]
Li, Yi-Xue [1 ]
Cai, Yu-Dong [2 ,4 ]
Chou, Kuo-Chen [3 ,4 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai, Peoples R China
[2] Inst Syst Biol, 99 Shang Da Rd, Shanghai 200444, Peoples R China
[3] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21413, Saudi Arabia
[4] Gordon Life Sci Inst, Belmont, MA USA
关键词
Bona fide contigs; computational method; datasets; metagenome contigs; Metagenomics; simulated metagenome; AMINO-ACID-COMPOSITION; PROTEASE CLEAVAGE SITES; PREDICTING SUBCELLULAR-LOCALIZATION; COUPLED RECEPTOR CLASSES; SUPPORT VECTOR MACHINE; WEB-SERVER; FUNCTIONAL DOMAIN; MEMBRANE-PROTEINS; TOOL; ATTRIBUTES;
D O I
10.2174/1574893611308040013
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In the metagenomics, long metagenome contigs can either improve metagenome gene prediction or metagenome sequence binning. Moreover, metagenome contigs can also make gene function annotation more accurate because they provide a lot of genome context information. Because of repetitive sequences of either intra-genomes or inter-genomes, metagenome contigs are probably wrongly assembled. Therefore, it is essential to develop a method to validate metagenome contigs. Here, we propose a computational method to validate metagenome contigs. After realigning raw sequencing reads onto one contig, we first compute a contig-ECDF (empirical cumulative probability distribution functions) and its corresponding reference using a computational simulation-based method. Because a reference of the contig-ECDF is changeless given some parameters, we use the distinction between them to check whether or not a contig is bona fide. The less the distinction is, the more likely a contig is bona fide. For wrongly assembled metagenome contigs, using simulated metagenome datasets, our method was shown to have a good capacity to identify them. After applying the method to a real metagenome dataset, which was sequenced from an in vitro-simulated microbial community with known constituted genomes, we showed that our method had a strong ability to identify bona fide contigs, and further demonstrated that small distinctions between contig-ECDFs and their references were significantly correlated with bona fide contigs. A computational method is developed to validate metagenome contigs. For each metagenome contig, our method gives it a score, and the smaller the score is, the more likely a contig is bona fide. After validation using both simulated and real datasets, our method was shown to have good performances.
引用
收藏
页码:511 / 523
页数:13
相关论文
共 50 条
  • [1] SprayNPray: user-friendly taxonomic profiling of genome and metagenome contigs
    Garber, Arkadiy I.
    Armbruster, Catherine R.
    Lee, Stella E.
    Cooper, Vaughn S.
    Bomberger, Jennifer M.
    McAllister, Sean M.
    BMC GENOMICS, 2022, 23 (01)
  • [2] 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs
    Pu, Lianrong
    Shamir, Ron
    NUCLEIC ACIDS RESEARCH, 2024, 52 (19)
  • [3] SprayNPray: user-friendly taxonomic profiling of genome and metagenome contigs
    Arkadiy I. Garber
    Catherine R. Armbruster
    Stella E. Lee
    Vaughn S. Cooper
    Jennifer M. Bomberger
    Sean M. McAllister
    BMC Genomics, 23
  • [4] Extreme Scale De Novo Metagenome Assembly
    Georganas, Evangelos
    Egan, Rob
    Hofmeyr, Steven
    Goltsman, Eugene
    Arndt, Bill
    Tritt, Andrew
    Buluc, Aydin
    Oliker, Leonid
    Yelick, Katherine
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [5] Assembly and comparative analyses of the Geosiphon pyriformis metagenome
    Sorwar, Essam
    Oliveira, Jordana Inacio Nascimento
    Malar, Mathu C.
    Kruger, Manuela
    Corradi, Nicolas
    ENVIRONMENTAL MICROBIOLOGY, 2024, 26 (07)
  • [6] New approaches for metagenome assembly with short reads
    Ayling, Martin
    Clark, Matthew D.
    Leggett, Richard M.
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (02) : 584 - 594
  • [7] ON IMPROVING DENSITY ESTIMATORS WHICH ARE NOT BONA-FIDE FUNCTIONS
    GAJEK, L
    ANNALS OF STATISTICS, 1986, 14 (04): : 1612 - 1618
  • [8] Metagenome Assembly and Metagenome-Assembled Genome Sequences from a Historical Oil Field Located in Wietze, Germany
    Eze, Michael O.
    Luetgert, Stephan A.
    Neubauer, Hannes
    Balouri, Angeliki
    Kraft, Alina A.
    Sieven, Anja
    Daniel, Rolf
    Wemheuer, Bernd
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2020, 9 (21):
  • [9] Metagenome Assembly and Metagenome-Assembled Genome Sequences from the Rhizosphere of Maize Plants in Mafikeng, South Africa
    Babalola, Olubukola O.
    Molefe, Rebaona R.
    Amoo, Adenike E.
    MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2021, 10 (08):
  • [10] Parallel and Memory-efficient Preprocessing for Metagenome Assembly
    Rengasamy, Vasudevan
    Medvedev, Paul
    Madduri, Kamesh
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 283 - 292