Supervised machine learning for microbiomics: Bridging the gap between current and best practices

被引:0
|
作者
Dudek, Natasha Katherine [1 ]
Chakhvadze, Mariami [2 ]
Kobakhidze, Saba [2 ,3 ]
Kantidze, Omar [2 ]
Gankin, Yuriy [1 ]
机构
[1] Quantori, Cambridge, MA 02142 USA
[2] Quantori, Tbilisi, GA USA
[3] Free Univ Tbilisi, Tbilisi, GA USA
来源
关键词
Microbiome; Machine learning; Microbiomics; Bioinformatics; ARTIFICIAL-INTELLIGENCE; HEALTH; DIAGNOSIS; BIAS;
D O I
10.1016/j.mlwa.2024.100607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is poised to drive innovations in clinical microbiomics, such as in disease diagnostics and prognostics. However, the successful implementation of ML in these domains necessitates the development of reproducible, interpretable models that meet the rigorous performance standards set by regulatory agencies. This study aims to identify key areas in need of improvement in current ML practices within microbiomics, with a focus on bridging the gap between existing methodologies and the requirements for clinical application. To do so, we analyze 100 peer-reviewed articles from 2021 to 2022. Within this corpus, datasets have a median size of 161.5 samples, with over one-third containing fewer than 100 samples, signaling a high potential for overfitting. Limited demographic data further raises concerns about generalizability and fairness, with 24% of studies omitting participants' country of residence, and attributes like race/ethnicity, education, and income rarely reported (11%, 2%, and 0%, respectively). Methodological issues are also common; for instance, for 86% of studies we could not confidently rule out test set omission and data leakage, suggesting a strong potential for inflated performance estimates across the literature. Reproducibility is a concern, with 78% of studies abstaining from sharing their ML code publicly. Based on this analysis, we provide guidance to avoid common pitfalls that can hinder model performance, generalizability, and trustworthiness. An interactive tutorial on applying ML to microbiomics data accompanies the discussion, to help establish and reinforce best practices within the community.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] BRIDGING THE GAP BETWEEN TEACHING AND LEARNING AT A DISTANCE
    NATHENSON, MB
    BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 1979, 10 (02) : 100 - 109
  • [32] Bridging the gap between striatal plasticity and learning
    Perrin, Elodie
    Venance, Laurent
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 54 : 104 - 112
  • [34] BRIDGING THE GAP BETWEEN MACHINE AND PRODUCTION CONTROL SYSTEM
    Seslija, Dragan
    Odri, Stevan
    Tesic, Zdravko
    Stankovski, Stevan
    FACTA UNIVERSITATIS-SERIES MECHANICAL ENGINEERING, 2005, 3 (01) : 81 - 92
  • [35] Gap analysis of current best practices in the area of continual commissioning
    Hryshchenko, A.
    Menzel, K.
    EWORK AND EBUSINESS IN ARCHITECTURE, ENGINEERING AND CONSTRUCTION 2014, 2015, : 695 - 700
  • [36] Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
    Kim, Eunji
    Kim, Siwon
    Lee, Jungbeom
    Kim, Hyunwoo
    Yoon, Sungroh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14238 - 14247
  • [37] Artificial Intelligence and Machine Learning in Pharmacological Research: Bridging the Gap Between Data and Drug Discovery
    Singh, Shruti
    Kumar, Rajesh
    Payra, Shuvasree
    Singh, Sunil K.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [38] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
    Alkhalifah, Tariq
    Wang, Hanchen
    Ovcharenko, Oleg
    ARTIFICIAL INTELLIGENCE IN GEOSCIENCES, 2022, 3 : 101 - 114
  • [39] Bridging the Gap between Academia and Industry in Machine Learning Software Defect Prediction: Thirteen Considerations
    Stradowski, Szymon
    Madeyski, Lech
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1098 - 1110
  • [40] Machine learning and tectonic setting determination: Bridging the gap between Earth scientists and data scientists
    Takaew, Pratchaya
    Xia, Jianhong Cecilia
    Doucet, Luc S.
    GEOSCIENCE FRONTIERS, 2024, 15 (01)