Microbiome Preprocessing Machine Learning Pipeline

被引:12
|
作者
Jasner, Yoel Y. [1 ]
Belogolovski, Anna [1 ]
Ben-Itzhak, Meirav [1 ]
Koren, Omry [2 ]
Louzoun, Yoram [1 ]
机构
[1] Bar Ilan Univ, Dept Math, Ramat Gan, Israel
[2] Bar Ilan Univ, Azrieli Fac Med, Ramat Gan, Israel
来源
FRONTIERS IN IMMUNOLOGY | 2021年 / 12卷
关键词
pipeline; machine learning; 16S; OTU; ASV; feature selection;
D O I
10.3389/fimmu.2021.677870
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Background 16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML. Methods We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification. Results We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results. Conclusions The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A Topological Machine Learning Pipeline for Classification
    Conti, Francesco
    Moroni, Davide
    Pascali, Maria Antonietta
    MATHEMATICS, 2022, 10 (17)
  • [22] Machine learning methods for microbiome studies
    Junghyun Namkung
    Journal of Microbiology, 2020, 58 : 206 - 216
  • [23] Machine learning and deep learning applications in microbiome research
    Medina, Ricardo Hernandez
    Kutuzova, Svetlana
    Nielsen, Knud Nor
    Johansen, Joachim
    Hansen, Lars Hestbjerg
    Nielsen, Mads
    Rasmussen, Simon
    ISME COMMUNICATIONS, 2022, 2 (01):
  • [24] DeepPrep: an accelerated, scalable and robust pipeline for neuroimaging preprocessing empowered by deep learning
    Ren, Jianxun
    An, Ning
    Lin, Cong
    Zhang, Youjia
    Sun, Zhenyu
    Zhang, Wei
    Li, Shiyi
    Guo, Ning
    Cui, Weigang
    Hu, Qingyu
    Wang, Weiwei
    Wu, Xuehai
    Wang, Yinyan
    Jiang, Tao
    Satterthwaite, Theodore D.
    Wang, Danhong
    Liu, Hesheng
    NATURE METHODS, 2025, 22 (03) : 473 - 476
  • [25] Improvization of Arrhythmia Detection Using Machine Learning and Preprocessing Techniques
    Babbar, Sarthak
    Kulshrestha, Sudhanshu
    Shangle, Kartik
    Dewan, Navroz
    Kesarwani, Saommya
    APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN ENGINEERING, VOL 2, 2019, 697 : 537 - 550
  • [26] From machine learning to knowledge discovery: Survey of preprocessing and postprocessing
    Bruha, Ivan
    Intelligent Data Analysis, 2000, 4 (3-4) : 363 - 374
  • [27] Improved Preprocessing for Machine Learning Intrusion Detection in IEEE 802.11
    Skrak, Peter
    Lehoczky, Peter
    Bencel, Rastislav
    Galinski, Marek
    Kotuliak, Ivan
    PROCEEDINGS OF THE 2022 14TH IFIP WIRELESS AND MOBILE NETWORKING CONFERENCE (WMNC 2022), 2022, : 118 - 122
  • [28] ILIOU machine learning preprocessing method for depression type prediction
    Theodoros Iliou
    Georgia Konstantopoulou
    Mandani Ntekouli
    Christina Lymperopoulou
    Konstantinos Assimakopoulos
    Dimitrios Galiatsatos
    George Anastassopoulos
    Evolving Systems, 2019, 10 : 29 - 39
  • [29] Detecting Spam Tweets Using Machine Learning and Effective Preprocessing
    Kardas, Berk
    Bayar, Ismail Erdem
    Ozyer, Tansel
    Alhajj, Reda
    PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 393 - 398
  • [30] Optimized preprocessing and machine learning for quantitative Raman spectroscopy in biology
    Storey, Emily E.
    Helmy, Amr S.
    JOURNAL OF RAMAN SPECTROSCOPY, 2019, 50 (07) : 958 - 968