Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:371
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Transcriptional insights into pathogenesis of cutaneous systemic sclerosis using pathway driven meta-analysis assisted by machine learning methods
    Xu, Xiao
    Ramanujam, Meera
    Visvanathan, Sudha
    Assassi, Shervin
    Liu, Zheng
    Li, Li
    PLOS ONE, 2020, 15 (11):
  • [32] Large Language Models (such as ChatGPT) as Tools for Machine Learning-Based Data Insights in Analytical Chemistry
    Duponchel, Ludovic
    de Oliveira, Rodrigo Rocha
    Motto-Ros, Vincent
    ANALYTICAL CHEMISTRY, 2025, 97 (13) : 6956 - 6961
  • [33] Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools
    Mendonca, Sandro De Paula
    Dos Santos Brito, Yvan Pereira
    Resque Dos Santos, Carlos Gustavo
    Divino Lima, Rodrigo Do Amor
    Oliveira De Araujo, Tiago Davi
    Meiguins, Bianchi Serique
    IEEE ACCESS, 2020, 8 (08): : 82917 - 82928
  • [34] Optimizing feeding frequencies in fish: A meta-analysis and machine learning approach
    Huang, Ming
    Zhou, Yan-Gen
    Yang, Xiao-Gang
    Gao, Qin-Feng
    Chen, Ya-Na
    Ren, Yi-Chao
    Dong, Shuang-Lin
    AQUACULTURE, 2025, 595
  • [35] Prediction of sepsis patients using machine learning approach: A meta-analysis
    Islam, Md. Mohaimenul
    Nasrin, Tahmina
    Walther, Bruno Andreas
    Wu, Chieh-Chen
    Yang, Hsuan-Chia
    Li , Yu-Chuan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2019, 170 : 1 - 9
  • [36] Green microalgae in intermittent light: a meta-analysis assisted by machine learning
    Wendie Levasseur
    Victor Pozzobon
    Patrick Perré
    Journal of Applied Phycology, 2022, 34 : 135 - 158
  • [37] Meta-analysis of voice disorders databases and applied machine learning techniques
    Syed S.A.
    Rashid M.
    Hussain S.
    Mathematical Biosciences and Engineering, 2020, 17 (06): : 7958 - 7979
  • [38] Machine Learning Approaches in High Myopia: Systematic Review and Meta-Analysis
    Zuo, Huiyi
    Huang, Baoyu
    He, Jian
    Fang, Liying
    Huang, Minli
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27
  • [39] Green microalgae in intermittent light: a meta-analysis assisted by machine learning
    Levasseur, Wendie
    Pozzobon, Victor
    Perre, Patrick
    JOURNAL OF APPLIED PHYCOLOGY, 2022, 34 (01) : 135 - 158
  • [40] Machine Learning Prediction Models for Gestational Diabetes Mellitus: Meta-analysis
    Zhang, Zheqing
    Yang, Luqian
    Han, Wentao
    Wu, Yaoyu
    Zhang, Linhui
    Gao, Chun
    Jiang, Kui
    Liu, Yun
    Wu, Huiqun
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (03)