Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization

被引:29
|
作者
Cai, Yun [1 ]
Gu, Hong [1 ]
Kenney, Toby [1 ]
机构
[1] Dalhousie, Dept Math & Stat, Halifax, NS, Canada
来源
MICROBIOME | 2017年 / 5卷
基金
加拿大自然科学与工程研究理事会;
关键词
Microbial communities; Subcommunities; Metagenomics; Non-negative Matrix factorization; GUT MICROBIOTA; CONVERGENCE; ALGORITHMS; MACROLIDE;
D O I
10.1186/s40168-017-0323-1
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Background: Learning the structure of microbial communities is critical in understanding the different community structures and functions of microbes in distinct individuals. We view microbial communities as consisting of many subcommunities which are formed by certain groups of microbes functionally dependent on each other. The focus of this paper is on methods for extracting the subcommunities from the data, in particular Non-Negative Matrix Factorization (NMF). Our methods can be applied to both OTU data and functional metagenomic data. We apply the existing unsupervised NMF method and also develop a new supervised NMF method for extracting interpretable information from classification problems. Results: The relevance of the subcommunities identified by NMF is demonstrated by their excellent performance for classification. Through three data examples, we demonstrate how to interpret the features identified by NMF to draw meaningful biological conclusions and discover hitherto unidentified patterns in the data. Comparing whole metagenomes of various mammals, (Muegge et al., Science 332: 970-974, 2011), the biosynthesis of macrolides pathway is found in hindgut-fermenting herbivores, but not carnivores. This is consistent with results in veterinary science that macrolides should not be given to non-ruminant herbivores. For time series microbiome data from various body sites (Caporaso et al., Genome Biol 12: 50, 2011), a shift in the microbial communities is identified for one individual. The shift occurs at around the same time in the tongue and gut microbiomes, indicating that the shift is a genuine biological trait, rather than an artefact of the method. For whole metagenome data from IBD patients and healthy controls (Qin et al., Nature 464: 59-65, 2010), we identify differences in a number of pathways (some known, others new). Conclusions: NMF is a powerful tool for identifying the key features of microbial communities. These identified features can not only be used to perform difficult classification problems with a high degree of accuracy, they are also very interpretable and can lead to important biological insights into the structure of the communities. In addition, NMF is a dimension-reduction method (similar to PCA) in that it reduces the extremely complex microbial data into a low-dimensional representation, allowing a number of analyses to be performed more easily-for example, searching for temporal patterns in the microbiome. When we are interested in the differences between the structures of two groups of communities, supervised NMF provides a better way to do this, while retaining all the advantages of NMF-e.g. interpretability and a simple biological intuition.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization
    Yun Cai
    Hong Gu
    Toby Kenney
    Microbiome, 5
  • [2] Correntropy Supervised Non-negative Matrix Factorization
    Zhang, Wenju
    Guan, Naiyang
    Tao, Dacheng
    Mao, Bin
    Huang, Xuhui
    Luo, Zhigang
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [3] Mutational signature learning with supervised negative binomial non-negative matrix factorization
    Lyu, Xinrui
    Garret, Jean
    Raetsch, Gunnar
    Kjong-Van Lehmann
    BIOINFORMATICS, 2020, 36 : 154 - 160
  • [4] Supervised Dictionary Learning via Non-Negative Matrix Factorization for Classification
    Li, Yifeng
    Ngom, Alioune
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 439 - 443
  • [5] A Unified Non-Negative Matrix Factorization Framework for Semi Supervised Learning on Graphs
    Mitra, Anasua
    Vijayan, Priyesh
    Parthasarathy, Srinivasan
    Ravindran, Balaraman
    PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 487 - 495
  • [6] Incremental Learning in the Non-negative Matrix Factorization
    Rebhan, Sven
    Sharif, Waqas
    Eggert, Julian
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 960 - +
  • [7] PROJECTIVE NON-NEGATIVE MATRIX FACTORIZATION FOR UNSUPERVISED GRAPH CLUSTERING
    Bampis, Christos G.
    Maragos, Petros
    Bovik, Alan C.
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1254 - 1258
  • [8] FULLY SUPERVISED NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE EXTRACTION
    Austin, Woody
    Anderson, Dylan
    Ghosh, Joydeep
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 5772 - 5775
  • [9] Guided Semi-Supervised Non-Negative Matrix Factorization
    Li, Pengyu
    Tseng, Christine
    Zheng, Yaxuan
    Chew, Joyce A.
    Huang, Longxiu
    Jarman, Benjamin
    Needell, Deanna
    ALGORITHMS, 2022, 15 (05)
  • [10] Supervised input space scaling for non-negative matrix factorization
    Driesen, J.
    Van Hamme, H.
    SIGNAL PROCESSING, 2012, 92 (08) : 1864 - 1874