An active learning approach for clustering single-cell RNA-seq data

被引:6
|
作者
Lin, Xiang [1 ]
Liu, Haoran [1 ]
Wei, Zhi [1 ]
Roy, Senjuti Basu [1 ]
Gao, Nan [2 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[2] Rutgers State Univ, Dept Biol Sci, Newark, NJ USA
关键词
D O I
10.1038/s41374-021-00639-w
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated-a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently. Active learning (AL) model is a framework designed for single-cell RNA sequence (scRNA-seq) clustering. This model requires that the researchers label a small number of cells selected by a sample selection algorithm. The labeled cells are then used for the supervision of the clustering, to significantly boost the clustering performance of scRNA-seq.
引用
收藏
页码:227 / 235
页数:9
相关论文
共 50 条
  • [31] An interpretable framework for clustering single-cell RNA-Seq datasets
    Jesse M. Zhang
    Jue Fan
    H. Christina Fan
    David Rosenfeld
    David N. Tse
    BMC Bioinformatics, 19
  • [32] GRACE: A Graph-Based Cluster Ensemble Approach for Single-Cell RNA-Seq Data Clustering
    Guan, Jihong
    Li, Rui-Yi
    Wang, Jiasheng
    IEEE ACCESS, 2020, 8 : 166730 - 166741
  • [33] A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data
    Srinivasan, Suhas
    Leshchyk, Anastasia
    Johnson, Nathan T.
    Korkin, Dmitry
    RNA, 2020, 26 (10) : 1303 - 1319
  • [34] scMAE: a masked autoencoder for single-cell RNA-seq clustering
    Fang, Zhaoyu
    Zheng, Ruiqing
    Li, Min
    BIOINFORMATICS, 2024, 40 (01)
  • [35] Single-cell RNA-seq clustering: datasets, models, and algorithms
    Peng, Lihong
    Tian, Xiongfei
    Tian, Geng
    Xu, Junlin
    Huang, Xin
    Weng, Yanbin
    Yang, Jialiang
    Zhou, Liqian
    RNA BIOLOGY, 2020, 17 (06) : 765 - 783
  • [36] Improving Single-Cell RNA-seq Clustering by Integrating Pathways
    Zhang, Chenxing
    Gao, Lin
    Wang, Bingbo
    Gao, Yong
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [37] An interpretable framework for clustering single-cell RNA-Seq datasets
    Zhang, Jesse M.
    Fan, Jue
    Fan, Christina
    Rosenfeld, David
    Tse, David N.
    BMC BIOINFORMATICS, 2018, 19
  • [38] Comparison of transformations for single-cell RNA-seq data
    Constantin Ahlmann-Eltze
    Wolfgang Huber
    Nature Methods, 2023, 20 : 665 - 672
  • [39] An Efficient and Flexible Method for Deconvoluting Bulk RNA-Seq Data with Single-Cell RNA-Seq Data
    Sun, Xifang
    Sun, Shiquan
    Yang, Sheng
    CELLS, 2019, 8 (10)
  • [40] Comparison of transformations for single-cell RNA-seq data
    Ahlmann-Eltze, Constantin
    Huber, Wolfgang
    NATURE METHODS, 2023, 20 (05) : 665 - +