An active learning approach for clustering single-cell RNA-seq data

被引:6
|
作者
Lin, Xiang [1 ]
Liu, Haoran [1 ]
Wei, Zhi [1 ]
Roy, Senjuti Basu [1 ]
Gao, Nan [2 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[2] Rutgers State Univ, Dept Biol Sci, Newark, NJ USA
关键词
D O I
10.1038/s41374-021-00639-w
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated-a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently. Active learning (AL) model is a framework designed for single-cell RNA sequence (scRNA-seq) clustering. This model requires that the researchers label a small number of cells selected by a sample selection algorithm. The labeled cells are then used for the supervision of the clustering, to significantly boost the clustering performance of scRNA-seq.
引用
收藏
页码:227 / 235
页数:9
相关论文
共 50 条
  • [21] scGAC: a graph attentional architecture for clustering single-cell RNA-seq data
    Cheng, Yi
    Ma, Xiuli
    BIOINFORMATICS, 2022, 38 (08) : 2187 - 2193
  • [22] Clustering and visualization of single-cell RNA-seq data using path metrics
    Manousidaki, Andriana
    Little, Anna
    Xie, Yuying
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (05)
  • [23] Single-cell RNA-seq data clustering: A survey with performance comparison study
    Li, Ruiyi
    Guan, Jihong
    Zhou, Shuigeng
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2020, 18 (04)
  • [24] Consensus clustering of single-cell RNA-seq data by enhancing network affinity
    Cui, Yaxuan
    Zhang, Shaoqiang
    Liang, Ying
    Wang, Xiangyun
    Ferraro, Thomas N.
    Chen, Yong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [25] SC3: consensus clustering of single-cell RNA-seq data
    Kiselev, Vladimir Yu
    Kirschner, Kristina
    Schaub, Michael T.
    Andrews, Tallulah
    Yiu, Andrew
    Chandra, Tamir
    Natarajan, Kedar N.
    Reik, Wolf
    Barahona, Mauricio
    Green, Anthony R.
    Hemberg, Martin
    NATURE METHODS, 2017, 14 (05) : 483 - +
  • [26] Comparison of Gene Selection Methods for Clustering Single-cell RNA-seq Data
    Zhu, Xiaoshu
    Wang, Jianxin
    Li, Rongruan
    Peng, Xiaoqing
    CURRENT BIOINFORMATICS, 2023, 18 (01) : 1 - 11
  • [27] SC3: Consensus clustering of single-cell RNA-seq data
    Kiselev V.Y.
    Kirschner K.
    Schaub M.T.
    Andrews T.
    Yiu A.
    Chandra T.
    Natarajan K.N.
    Reik W.
    Barahona M.
    Green A.R.
    Hemberg M.
    Nature Methods, 2017, 14 (5) : 483 - 486
  • [28] Online Single-cell RNA-seq Data Denoising with Transfer Learning
    Kang, Bowei
    Abeysinghe, Eroma
    Agarwal, Divyansh
    Wang, Quanli
    Pamidighantam, Sudhakar
    Huang, Mo
    Zhang, Nancy R.
    Wang, Jingshu
    PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2020, PEARC 2020, 2020, : 469 - 472
  • [29] SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data
    Peng, Tao
    Zhu, Qin
    Yin, Penghang
    Tan, Kai
    GENOME BIOLOGY, 2019, 20 (1)
  • [30] SCRABBLE: single-cell RNA-seq imputation constrained by bulk RNA-seq data
    Tao Peng
    Qin Zhu
    Penghang Yin
    Kai Tan
    Genome Biology, 20