PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data

被引:39
|
作者
Lemsara, Amina [1 ]
Ouadfel, Salima [1 ]
Froehlich, Holger [2 ,3 ]
机构
[1] Univ Constantine 2, Comp Sci Dept, Constantine 25016, Algeria
[2] Univ Bonn, Int Ctr IT, D-53115 Bonn, Germany
[3] Fraunhofer Inst for Algorithms & Sci Comp SCAI, D-53754 Sankt, Augustin, Germany
关键词
Deep learning; Patient clustering; Multi-omics; MOLECULAR PORTRAITS; PROGNOSTIC-FACTOR; CLASS DISCOVERY; BREAST-CANCER; LUNG-CANCER; EXPRESSION; METHYLATION; MUTATION; SUBTYPES; PROTEIN;
D O I
10.1186/s12859-020-3465-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Recent years have witnessed an increasing interest in multi-omics data, because these data allow for better understanding complex diseases such as cancer on a molecular system level. In addition, multi-omics data increase the chance to robustly identify molecular patient sub-groups and hence open the door towards a better personalized treatment of diseases. Several methods have been proposed for unsupervised clustering of multi-omics data. However, a number of challenges remain, such as the magnitude of features and the large difference in dimensionality across different omics data sources. Results We propose a multi-modal sparse denoising autoencoder framework coupled with sparse non-negative matrix factorization to robustly cluster patients based on multi-omics data. The proposed model specifically leverages pathway information to effectively reduce the dimensionality of omics data into a pathway and patient specific score profile. In consequence, our method allows us to understand, which pathway is a feature of which particular patient cluster. Moreover, recently proposed machine learning techniques allow us to disentangle the specific impact of each individual omics feature on a pathway score. We applied our method to cluster patients in several cancer datasets using gene expression, miRNA expression, DNA methylation and CNVs, demonstrating the possibility to obtain biologically plausible disease subtypes characterized by specific molecular features. Comparison against several competing methods showed a competitive clustering performance. In addition, post-hoc analysis of somatic mutations and clinical data provided supporting evidence and interpretation of the identified clusters. Conclusions Our suggested multi-modal sparse denoising autoencoder approach allows for an effective and interpretable integration of multi-omics data on pathway level while addressing the high dimensional character of omics data. Patient specific pathway score profiles derived from our model allow for a robust identification of disease subgroups.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Clustering single-cell multi-omics data with MoClust
    Yuan, Musu
    Chen, Liang
    Deng, Minghua
    BIOINFORMATICS, 2023, 39 (01)
  • [22] multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data
    Sebastian Canzler
    Jörg Hackermüller
    BMC Bioinformatics, 21
  • [23] multiGSEA: a GSEA-based pathway enrichment analysis for multi-omics data
    Canzler, Sebastian
    Hackermuller, Jorg
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [24] scMIC: A Deep Multi-Level Information Fusion Framework for Clustering Single-Cell Multi-Omics Data
    Zhan, Youlin
    Liu, Jiahan
    Ou-Yang, Le
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (12) : 6121 - 6132
  • [25] Directional integration and pathway enrichment analysis for multi-omics data
    Slobodyanyuk, Mykhaylo
    Bahcheli, Alexander T.
    Klein, Zoe P.
    Bayati, Masroor
    Strug, Lisa J.
    Reimand, Juri
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [26] Identification of potential COPD genes based on multi-omics data at the functional level
    Liu, Zhe
    Li, Wan
    Lv, Junjie
    Xie, Ruiqiang
    Huang, Hao
    Li, Yiran
    He, Yuehan
    Jiang, Jing
    Chen, Binbin
    Guo, Shanshan
    Chen, Lina
    MOLECULAR BIOSYSTEMS, 2016, 12 (01) : 191 - 204
  • [27] Integrative clustering methods of multi-omics data for molecule-based cancer classifications
    Dongfang Wang
    Jin Gu
    Quantitative Biology, 2016, 4 (01) : 58 - 67
  • [28] Estimating the Prognosis of Low-Grade Glioma with Gene Attention Using Multi-Omics and Multi-Modal Schemes
    Choi, Sanghyuk Roy
    Lee, Minhyeok
    BIOLOGY-BASEL, 2022, 11 (10):
  • [29] How to interpret and integrate multi-omics data at systems level
    Jung, Gun Tae
    Kim, Kwang-Pyo
    Kim, Kwoneel
    ANIMAL CELLS AND SYSTEMS, 2020, 24 (01) : 1 - 7
  • [30] A Clustering Algorithm for Multi-Modal Heterogeneous Big Data With Abnormal Data
    Yan, An
    Wang, Wei
    Ren, Yi
    Geng, HongWei
    FRONTIERS IN NEUROROBOTICS, 2021, 15