Supervised machine learning for microbiomics: Bridging the gap between current and best practices

被引:0
|
作者
Dudek, Natasha Katherine [1 ]
Chakhvadze, Mariami [2 ]
Kobakhidze, Saba [2 ,3 ]
Kantidze, Omar [2 ]
Gankin, Yuriy [1 ]
机构
[1] Quantori, Cambridge, MA 02142 USA
[2] Quantori, Tbilisi, GA USA
[3] Free Univ Tbilisi, Tbilisi, GA USA
来源
关键词
Microbiome; Machine learning; Microbiomics; Bioinformatics; ARTIFICIAL-INTELLIGENCE; HEALTH; DIAGNOSIS; BIAS;
D O I
10.1016/j.mlwa.2024.100607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is poised to drive innovations in clinical microbiomics, such as in disease diagnostics and prognostics. However, the successful implementation of ML in these domains necessitates the development of reproducible, interpretable models that meet the rigorous performance standards set by regulatory agencies. This study aims to identify key areas in need of improvement in current ML practices within microbiomics, with a focus on bridging the gap between existing methodologies and the requirements for clinical application. To do so, we analyze 100 peer-reviewed articles from 2021 to 2022. Within this corpus, datasets have a median size of 161.5 samples, with over one-third containing fewer than 100 samples, signaling a high potential for overfitting. Limited demographic data further raises concerns about generalizability and fairness, with 24% of studies omitting participants' country of residence, and attributes like race/ethnicity, education, and income rarely reported (11%, 2%, and 0%, respectively). Methodological issues are also common; for instance, for 86% of studies we could not confidently rule out test set omission and data leakage, suggesting a strong potential for inflated performance estimates across the literature. Reproducibility is a concern, with 78% of studies abstaining from sharing their ML code publicly. Based on this analysis, we provide guidance to avoid common pitfalls that can hinder model performance, generalizability, and trustworthiness. An interactive tutorial on applying ML to microbiomics data accompanies the discussion, to help establish and reinforce best practices within the community.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Bridging the postpartum gap: best practices for training of obstetrical patient navigators
    Yee, Lynn M.
    Williams, Brittney
    Green, Hannah M.
    Carmona-Barrera, Viridiana
    Diaz, Laura
    Davis, Ka'Derricka
    Kominiarek, Michelle A.
    Feinglass, Joe
    Zera, Chloe A.
    Grobman, William A.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2021, 225 (02) : 138 - 152
  • [23] Towards machine learning guided by best practices
    Mojica-Hanke, Anamaria
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION, 2023, : 240 - 244
  • [24] Machine Learning in CNC Machining: Best Practices
    von Hahn, Tim
    Mechefske, Chris K.
    MACHINES, 2022, 10 (12)
  • [25] Best practices in machine learning for chemistry comment
    Artrith, Nongnuch
    Butler, Keith T.
    Coudert, Francois-Xavier
    Han, Seungwu
    Isayev, Olexandr
    Jain, Anubhav
    Walsh, Aron
    NATURE CHEMISTRY, 2021, 13 (06) : 505 - 508
  • [26] Interpretable machine learning for dermatological disease detection: Bridging the gap between accuracy and explainability
    Nasir, Yusra
    Kadian, Karuna
    Sharma, Arun
    Dwivedi, Vimal
    Computers in Biology and Medicine, 2024, 179
  • [27] Bridging the gap between clinical-omics and machine learning to improve cancer treatment
    Moon, Chang In
    Jia, Byron
    Zhang, Bing
    CANCER RESEARCH, 2023, 83 (07)
  • [28] Editorial: Understanding and bridging the gap between neuromorphic computing and machine learning, volume II
    Deng, Lei
    Tang, Huajin
    Roy, Kaushik
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
  • [29] Machine Learning for Bridging the Gap between Density Functional Theory and Coupled Cluster Energies
    Ruth, Marcel
    Gerbig, Dennis
    Schreiner, Peter R.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2023, 19 (15) : 4912 - 4920
  • [30] Transfer-Learning: Bridging the Gap between Real and Simulation Data for Machine Learning in Injection Molding
    Tercan, Hasan
    Guajardo, Alexandro
    Heinisch, Julian
    Thiele, Thomas
    Hopmann, Christian
    Meisen, Tobias
    51ST CIRP CONFERENCE ON MANUFACTURING SYSTEMS, 2018, 72 : 185 - 190