MarkerML - Marker Feature Identification in Metagenomic Datasets Using Interpretable Machine Learning

被引:7
|
作者
Nagpal, Sunil [1 ,2 ,3 ]
Singh, Rohan [1 ]
Taneja, Bhupesh [2 ,3 ]
Mande, Sharmila S. [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Pune 411013, India
[2] CSIR, Inst Genom & Integrat Biol GIB, New Delhi 110025, India
[3] Acad Sci & Innovat Res AcSIR, Ghaziabad 201002, India
关键词
metagenomic biomarkers; interpretable machine learning; SHAP; microbiome; marker features; DATABASE;
D O I
10.1016/j.jmb.2022.167589
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Identification of environment specific marker-features is one of the key objectives of many metagenomic studies. It aims to identify such features in microbiome datasets that may serve as markers of the contrasting or comparable states. Hypothesis testing and black-box machine learnt models which are conventionally used for identification of these features are generally not exhaustive, especially because they generally do-not provide any quantifiable relevance (context) of/between the identified features. We present MarkerML web-server, that seeks to leverage the emergence of interpretable machine learning for facilitating the contextual discovery of metagenomic features of interest. It does so through a comprehensive and automated application of the concept of Shapley Additive Explanations in companionship to the compositionality accounted hypothesis testing for the multi-variate microbiome datasets. MarkerML not only helps in identification of marker-features, but also enables insights into the role and interdependence of the identified features in driving the decision making of the supervised machine learnt model. Generation of high quality and intuitive visualizations spanning prediction effect plots, model performance reports, feature dependency plots, Shapley and abundance informed cladograms (Sungrams), hypothesis tested violin plots along-with necessary provisions for excluding the participant bias and ensuring reproducibility of results, further seek to make the platform a useful asset for the scientists in the field of microbiome (and even beyond). The MarkerML web-server is freely available for the academic community at https://microbiome.igib.res.in/markerml/.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Feature identification for parameter extraction and defect detection using machine learning
    Guo, Y.
    Pahlavani, H.
    Khachaturiants, A.
    Elsayed, K.
    van de Laar, J.
    Simons, E.
    Saikumar, N.
    Sadeghian, H.
    METROLOGY, INSPECTION, AND PROCESS CONTROL XXXVIII, 2024, 12955
  • [22] Key feature identification of internal kink mode using machine learning
    Ning, Hongwei
    Lou, Shuyong
    Wu, Jianguo
    Zhou, Teng
    FRONTIERS IN PHYSICS, 2024, 12
  • [23] Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods
    Yan, Haidong
    Lee, Jiyoung
    Song, Qi
    Li, Qi
    Schiefelbein, John
    Zhao, Bingyu
    Li, Song
    NEW PHYTOLOGIST, 2022, 234 (04) : 1507 - 1520
  • [24] Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights
    Pasolli, Edoardo
    Duy Tin Truong
    Malik, Faizan
    Waldron, Levi
    Segata, Nicola
    PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (07)
  • [25] Interpretable federated learning for machine condition monitoring: Interpretable average global model as a fault feature library
    Feng, Xiao
    Wang, Dong
    Hou, Bingchang
    Yan, Tongtong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [26] Curated Datasets and Feature Analysis for Phishing Email Detection with Machine Learning
    Champa, Arifa I.
    Rabbi, Md Fazle
    Zibran, Minhaz F.
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [27] Practical feature filter strategy to machine learning for small datasets in chemistry
    Hu, Yang
    Sandt, Roland
    Spatschek, Robert
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [28] Prediction of phytoplankton biomass and identification of key influencing factors using interpretable machine learning models
    Xu, Yi
    Zhang, Di
    Lin, Junqiang
    Peng, Qidong
    Lei, Xiaohui
    Jin, Tiantian
    Wang, Jia
    Yuan, Ruifang
    ECOLOGICAL INDICATORS, 2024, 158
  • [29] Identification of Asthma-COPD Overlap Using a Novel Handheld Capnometer and Interpretable Machine Learning
    Talker, L.
    Dogan, C.
    Lim, R.
    Broomfield, H.
    Neville, D.
    Wiffen, L.
    Lambert, G.
    Selim, A.
    Hayward, G.
    Ashdown, H. F.
    Brown, T.
    Vijaykumar, E.
    Chauhan, A.
    Patel, A. X.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2024, 209
  • [30] Identification of texture MRI brain abnormalities on Fibromyalgia syndrome using interpretable machine learning models
    Jiang, Hongyang
    Liu, Aihui
    Ying, Zhenhua
    SCIENTIFIC REPORTS, 2024, 14 (01):