An active learning approach for clustering single-cell RNA-seq data

被引:6
|
作者
Lin, Xiang [1 ]
Liu, Haoran [1 ]
Wei, Zhi [1 ]
Roy, Senjuti Basu [1 ]
Gao, Nan [2 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
[2] Rutgers State Univ, Dept Biol Sci, Newark, NJ USA
关键词
D O I
10.1038/s41374-021-00639-w
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Single-cell RNA sequencing (scRNA-seq) data has been widely used to profile cellular heterogeneities with a high-resolution picture. Clustering analysis is a crucial step of scRNA-seq data analysis because it provides a chance to identify and uncover undiscovered cell types. Most methods for clustering scRNA-seq data use an unsupervised learning strategy. Since the clustering step is separated from the cell annotation and labeling step, it is not uncommon for a totally exotic clustering with poor biological interpretability to be generated-a result generally undesired by biologists. To solve this problem, we proposed an active learning (AL) framework for clustering scRNA-seq data. The AL model employed a learning algorithm that can actively query biologists for labels, and this manual labeling is expected to be applied to only a subset of cells. To develop an optimal active learning approach, we explored several key parameters of the AL model in the experiments with four real scRNA-seq datasets. We demonstrate that the proposed AL model outperformed state-of-the-art unsupervised clustering methods with less than 1000 labeled cells. Therefore, we conclude that AL model is a promising tool for clustering scRNA-seq data that allows us to achieve a superior performance effectively and efficiently. Active learning (AL) model is a framework designed for single-cell RNA sequence (scRNA-seq) clustering. This model requires that the researchers label a small number of cells selected by a sample selection algorithm. The labeled cells are then used for the supervision of the clustering, to significantly boost the clustering performance of scRNA-seq.
引用
收藏
页码:227 / 235
页数:9
相关论文
共 50 条
  • [41] SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
    Lu Yang
    Jiancheng Liu
    Qiang Lu
    Arthur D. Riggs
    Xiwei Wu
    BMC Genomics, 18
  • [42] scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data
    Zile Wang
    Haiyun Wang
    Jianping Zhao
    Chunhou Zheng
    BMC Bioinformatics, 24
  • [43] SAIC: an iterative clustering approach for analysis of single cell RNA-seq data
    Yang, Lu
    Liu, Jiancheng
    Lu, Qiang
    Riggs, Arthur D.
    Wu, Xiwei
    BMC GENOMICS, 2017, 18
  • [44] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Wang, Chunxiang
    Gao, Xin
    Liu, Juntao
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [45] Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data
    Chunxiang Wang
    Xin Gao
    Juntao Liu
    BMC Bioinformatics, 21
  • [46] CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data
    Lin, Peijie
    Troup, Michael
    Ho, Joshua W. K.
    GENOME BIOLOGY, 2017, 18
  • [47] FlowGrid enables fast clustering of very large single-cell RNA-seq data
    Fang, Xiunan
    Ho, Joshua W. K.
    BIOINFORMATICS, 2022, 38 (01) : 282 - 283
  • [48] CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data
    Peijie Lin
    Michael Troup
    Joshua W. K. Ho
    Genome Biology, 18
  • [49] scFseCluster: a feature selection-enhanced clustering for single-cell RNA-seq data
    Wang, Zongqin
    Xie, Xiaojun
    Liu, Shouyang
    Ji, Zhiwei
    LIFE SCIENCE ALLIANCE, 2023, 6 (12)
  • [50] scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data
    Wang, Zile
    Wang, Haiyun
    Zhao, Jianping
    Zheng, Chunhou
    BMC BIOINFORMATICS, 2023, 24 (01)