A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data

被引:5
|
作者
Lall, Snehalika [1 ]
Ray, Sumanta [2 ,3 ]
Bandyopadhyay, Sanghamitra [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata, India
[2] Aliah Univ, Dept Comp Sci & Engn, Kolkata, India
[3] Hlth Analyt Network, Pittsburgh, PA USA
关键词
VALIDATION;
D O I
10.1371/journal.pcbi.1009600
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering.Here we introduce sc-CGconv (copula based graph convolution network for single clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell-cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. Author summaryOne of the important aspects of single cell downstream analysis is to classify cells into subpopulations. This immediately leads to clustering of cells into homogeneous groups, which faces lots of issues due to (i) small amount of starting RNA, (ii) cell-to-cell variability, (iii) technical noise incorporated within the single cell sequencing technology, and (iv) unavailability of discriminating selected/extracted genes (features) in the preprocessing step of downstream analysis. We proposed sc-CGconv, stepwise feature extraction and clustering framework, which leverage landmark advantage of copula and graph convolution network in single-cell analysis domain. sc-CGconv outperforms the state-of-the-art feature selection/extraction methods in the preprocessing steps, performs well with small sample size data, can preserve the cell-to-cell variability within the extracted features, provides a topology-preserving embedding of cells in low dimensional space. sc-CGconv therefore successfully addresses the above-mentioned key challenges.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] scASGC: An adaptive simplified graph convolution model for clustering single-cell RNA-seq data
    Wang, Shudong
    Zhang, Yu
    Zhang, Yulin
    Wu, Wenhao
    Ye, Lan
    Li, Yunyin
    Su, Jionglong
    Pang, Shanchen
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 163
  • [2] scGAC: a graph attentional architecture for clustering single-cell RNA-seq data
    Cheng, Yi
    Ma, Xiuli
    BIOINFORMATICS, 2022, 38 (08) : 2187 - 2193
  • [3] A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder
    Zixiang Luo
    Chenyu Xu
    Zhen Zhang
    Wenfei Jin
    Scientific Reports, 11
  • [4] A topology-preserving dimensionality reduction method for single-cell RNA-seq data using graph autoencoder
    Luo, Zixiang
    Xu, Chenyu
    Zhang, Zhen
    Jin, Wenfei
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [5] Single-cell RNA-seq data analysis based on directed graph neural network
    Feng, Xiang
    Zhang, Hongqi
    Lin, Hao
    Long, Haixia
    METHODS, 2023, 211 : 48 - 60
  • [6] scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data
    Gu, Haocheng
    Cheng, Hao
    Ma, Anjun
    Li, Yang
    Wang, Juexin
    Xu, Dong
    Ma, Qin
    BIOINFORMATICS, 2022, 38 (23) : 5322 - 5325
  • [7] Consensus clustering of single-cell RNA-seq data by enhancing network affinity
    Cui, Yaxuan
    Zhang, Shaoqiang
    Liang, Ying
    Wang, Xiangyun
    Ferraro, Thomas N.
    Chen, Yong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [8] Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning
    Lee, Junseok
    Kim, Sungwon
    Hyun, Dongmin
    Lee, Namkyeong
    Kim, Yejin
    Park, Chanyoung
    BIOINFORMATICS, 2023, 39 (06)
  • [9] GSE: Graph similarity enhancement algorithm for single-cell RNA-seq data clustering
    Bu, Shugui
    Guo, Lilu
    Li, Rongyuan
    Lu, Jianbo
    Zhu, Xiaoshu
    2019 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP 2019), 2019, : 406 - 410
  • [10] ECBN: Ensemble Clustering based on Bayesian Network inference for Single-cell RNA-seq Data
    Zhang, Dexin
    Zhu, Yuan
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 5884 - 5888