Supervised machine learning for microbiomics: Bridging the gap between current and best practices

被引:0
|
作者
Dudek, Natasha Katherine [1 ]
Chakhvadze, Mariami [2 ]
Kobakhidze, Saba [2 ,3 ]
Kantidze, Omar [2 ]
Gankin, Yuriy [1 ]
机构
[1] Quantori, Cambridge, MA 02142 USA
[2] Quantori, Tbilisi, GA USA
[3] Free Univ Tbilisi, Tbilisi, GA USA
来源
关键词
Microbiome; Machine learning; Microbiomics; Bioinformatics; ARTIFICIAL-INTELLIGENCE; HEALTH; DIAGNOSIS; BIAS;
D O I
10.1016/j.mlwa.2024.100607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is poised to drive innovations in clinical microbiomics, such as in disease diagnostics and prognostics. However, the successful implementation of ML in these domains necessitates the development of reproducible, interpretable models that meet the rigorous performance standards set by regulatory agencies. This study aims to identify key areas in need of improvement in current ML practices within microbiomics, with a focus on bridging the gap between existing methodologies and the requirements for clinical application. To do so, we analyze 100 peer-reviewed articles from 2021 to 2022. Within this corpus, datasets have a median size of 161.5 samples, with over one-third containing fewer than 100 samples, signaling a high potential for overfitting. Limited demographic data further raises concerns about generalizability and fairness, with 24% of studies omitting participants' country of residence, and attributes like race/ethnicity, education, and income rarely reported (11%, 2%, and 0%, respectively). Methodological issues are also common; for instance, for 86% of studies we could not confidently rule out test set omission and data leakage, suggesting a strong potential for inflated performance estimates across the literature. Reproducibility is a concern, with 78% of studies abstaining from sharing their ML code publicly. Based on this analysis, we provide guidance to avoid common pitfalls that can hinder model performance, generalizability, and trustworthiness. An interactive tutorial on applying ML to microbiomics data accompanies the discussion, to help establish and reinforce best practices within the community.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Bridging the complexity gap in computational heterogeneous catalysis with machine learning
    Mou, Tianyou
    Pillai, Hemanth Somarajan
    Wang, Siwen
    Wan, Mingyu
    Han, Xue
    Schweitzer, Neil M.
    Che, Fanglin
    Xin, Hongliang
    NATURE CATALYSIS, 2023, 6 (02) : 122 - 136
  • [42] Best practices for machine learning in antibody discovery and development
    Wossnig, Leonard
    Furtmann, Norbert
    Buchanan, Andrew
    Kumar, Sandeep
    Greiff, Victor
    DRUG DISCOVERY TODAY, 2024, 29 (07)
  • [43] Testing Machine Learning: Best Practices for the Life Cycle
    Chandrasekaran, Jaganmohan
    Cody, Tyler
    McCarthy, Nicola
    Lanus, Erin
    Freeman, Laura
    Alexander, Kristen
    NAVAL ENGINEERS JOURNAL, 2024, 136 (1-2) : 249 - 263
  • [44] Machine Learning and Statistics in Clinical Research-Bridging the Gap
    Karabacak, Mert
    Margetis, Konstantinos
    JAMA PEDIATRICS, 2023, 177 (10) : 1111 - 1111
  • [45] Bridging the gap: Machine learning to resolve improperly modeled dynamics
    Qraitem, Maan
    Kularatne, Dhanushka
    Forgoston, Eric
    Hsieh, M. Ani
    PHYSICA D-NONLINEAR PHENOMENA, 2020, 414
  • [46] Machine Learning Best Practices for Soft Robot Proprioception
    Zhang, Annan
    Wang, Tsun-Hsuan
    Truby, Ryan L.
    Chin, Lillian
    Rus, Daniela
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 2564 - 2571
  • [47] Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning
    Piot, Bilal
    Geist, Matthieu
    Pietquin, Olivier
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) : 1814 - 1826
  • [48] Machine Learning in Human-Robot Collaboration: Bridging the Gap
    Matuszek, Cynthia
    Soh, Harold
    Gombolay, Matthew
    Gopalan, Nakul
    Simmons, Reid
    Nikoladis, Stefanos
    PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), 2022, : 1275 - 1277
  • [49] Bridging the gap in electronic structure calculations via machine learning
    Cangi, Attila
    NATURE COMPUTATIONAL SCIENCE, 2024, : 729 - 730
  • [50] Bridging the complexity gap in computational heterogeneous catalysis with machine learning
    Tianyou Mou
    Hemanth Somarajan Pillai
    Siwen Wang
    Mingyu Wan
    Xue Han
    Neil M. Schweitzer
    Fanglin Che
    Hongliang Xin
    Nature Catalysis, 2023, 6 : 122 - 136