PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data

被引:39
|
作者
Lemsara, Amina [1 ]
Ouadfel, Salima [1 ]
Froehlich, Holger [2 ,3 ]
机构
[1] Univ Constantine 2, Comp Sci Dept, Constantine 25016, Algeria
[2] Univ Bonn, Int Ctr IT, D-53115 Bonn, Germany
[3] Fraunhofer Inst for Algorithms & Sci Comp SCAI, D-53754 Sankt, Augustin, Germany
关键词
Deep learning; Patient clustering; Multi-omics; MOLECULAR PORTRAITS; PROGNOSTIC-FACTOR; CLASS DISCOVERY; BREAST-CANCER; LUNG-CANCER; EXPRESSION; METHYLATION; MUTATION; SUBTYPES; PROTEIN;
D O I
10.1186/s12859-020-3465-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Recent years have witnessed an increasing interest in multi-omics data, because these data allow for better understanding complex diseases such as cancer on a molecular system level. In addition, multi-omics data increase the chance to robustly identify molecular patient sub-groups and hence open the door towards a better personalized treatment of diseases. Several methods have been proposed for unsupervised clustering of multi-omics data. However, a number of challenges remain, such as the magnitude of features and the large difference in dimensionality across different omics data sources. Results We propose a multi-modal sparse denoising autoencoder framework coupled with sparse non-negative matrix factorization to robustly cluster patients based on multi-omics data. The proposed model specifically leverages pathway information to effectively reduce the dimensionality of omics data into a pathway and patient specific score profile. In consequence, our method allows us to understand, which pathway is a feature of which particular patient cluster. Moreover, recently proposed machine learning techniques allow us to disentangle the specific impact of each individual omics feature on a pathway score. We applied our method to cluster patients in several cancer datasets using gene expression, miRNA expression, DNA methylation and CNVs, demonstrating the possibility to obtain biologically plausible disease subtypes characterized by specific molecular features. Comparison against several competing methods showed a competitive clustering performance. In addition, post-hoc analysis of somatic mutations and clinical data provided supporting evidence and interpretation of the identified clusters. Conclusions Our suggested multi-modal sparse denoising autoencoder approach allows for an effective and interpretable integration of multi-omics data on pathway level while addressing the high dimensional character of omics data. Patient specific pathway score profiles derived from our model allow for a robust identification of disease subgroups.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Multiview clustering of multi-omics data integration by using a penalty model
    Hamas A. AL-kuhali
    Ma Shan
    Mohanned Abduljabbar Hael
    Eman A. Al-Hada
    Shamsan A. Al-Murisi
    Ahmed A. Al-kuhali
    Ammar A. Q. Aldaifl
    Mohammed Elmustafa Amin
    BMC Bioinformatics, 23
  • [32] Dual alignment feature embedding network for multi-omics data clustering
    Xiao, Yuang
    Yang, Dong
    Li, Jiaxin
    Zou, Xin
    Zhou, Hua
    Tang, Chang
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [33] Multiview clustering of multi-omics data integration by using a penalty model
    AL-kuhali, Hamas A.
    Shan, Ma
    Hael, Mohanned Abduljabbar
    Al-Hada, Eman A.
    Al-Murisi, Shamsan A.
    Al-kuhali, Ahmed A.
    Aldaifl, Ammar A. Q.
    Amin, Mohammed Elmustafa
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [34] A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization
    Zhou, Fangting
    He, Kejun
    Cai, James J.
    Davidson, Laurie A.
    Chapkin, Robert S.
    Ni, Yang
    STATISTICS IN BIOSCIENCES, 2023, 15 (03) : 669 - 691
  • [35] A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization
    Fangting Zhou
    Kejun He
    James J. Cai
    Laurie A. Davidson
    Robert S. Chapkin
    Yang Ni
    Statistics in Biosciences, 2023, 15 : 669 - 691
  • [36] Social web video clustering based on multi-modal and clustering ensemble
    Mekthanavanh, Vinath
    Li, Tianrui
    Hu, Jie
    Yang, Yan
    NEUROCOMPUTING, 2019, 366 : 234 - 247
  • [37] An extension of latent unknown clustering integrating multi-omics data (LUCID) incorporating incomplete omics data
    Zhao, Yinqi
    Jia, Qiran
    Goodrich, Jesse
    Darst, Burcu
    Conti, David, V
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [38] PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration
    Wieder, Cecilia
    Cooke, Juliette
    Frainay, Clement
    Poupin, Nathalie
    Bowler, Russell
    Jourdan, Fabien
    Kechris, Katerina J.
    Lai, Rachel P. J.
    Ebbels, Timothy
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (03)
  • [39] PathwayPCA: an R/Bioconductor Package for Pathway Based Integrative Analysis of Multi-Omics Data
    Odom, Gabriel J.
    Ban, Yuguang
    Colaprico, Antonio
    Liu, Lizhong
    Silva, Tiago Chedraoui
    Sun, Xiaodian
    Pico, Alexander R.
    Zhang, Bing
    Wang, Lily
    Chen, Xi
    PROTEOMICS, 2020, 20 (21-22)
  • [40] A robust multi-level sparse classifier with multi-modal feature extraction for face recognition
    Vishwakarma, Virendra P.
    Mishra, Gargi
    INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2019, 6 (01) : 76 - 102