CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning

被引:7
|
作者
Seal, Dibyendu Bikash [1 ]
Das, Vivek [2 ]
De, Rajat K. [3 ]
机构
[1] Univ Calcutta, AK Choudhury Sch Informat Technol, JD 2,Sect 3, Kolkata 700106, India
[2] Novo Nordisk AS, Novo Nordisk Pk 1, DK-2760 Malov, Denmark
[3] Indian Stat Inst, Machine Intelligence Unit, 203 Barrackpore Trunk Rd, Kolkata 700108, India
关键词
scRNA-seq; Semi-supervised learning; NMF; k-means; RNA-SEQ DATA; DIMENSIONALITY REDUCTION; IDENTIFICATION; CLASSIFICATION; IMPUTATION; DYNAMICS;
D O I
10.1007/s10489-022-03440-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of individual cells with accurate, annotated labels. Presently, this task of cell-type annotation is done based on inspection of marker genes from each of the statistically significant groups of cells. This is both challenging and time consuming. In this article, we have proposed a semi-supervised cell-type annotation method, called CASSL, based on Non-negative matrix factorization (NMF) coupled with recursive k-means algorithm. A semi-supervised model is capable of learning labels for a large amount of unlabelled data with the help of a limited amount of labelled data. The effectiveness of CASSL has been demonstrated on eight publicly available human and mice scRNA-seq datasets across varied organs and protocols. It has been able to correctly annotate majority of the unlabelled cells with high accuracy. It has also been evaluated for its correctness of clustering solution, robustness across varying percentage of missing labels, and time taken for execution. When compared with state-of-the-art unsupervised and semi-supervised cell-type annotation methods, CASSL has consistently outperformed others across all metrics for most of the datasets. It has also shown competitive results when compared against state-of-the-art supervised methods.
引用
收藏
页码:1287 / 1305
页数:19
相关论文
共 50 条
  • [11] Semi-Supervised Deep Learning for Cell Type Identification From Single-Cell Transcriptomic Data
    Dong, Xishuang
    Chowdhury, Shanta
    Victor, Uboho
    Li, Xiangfang
    Qian, Lijun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 1492 - 1505
  • [12] scDeepSort: a pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network
    Shao, Xin
    Yang, Haihong
    Zhuang, Xiang
    Liao, Jie
    Yang, Penghui
    Cheng, Junyun
    Lu, Xiaoyan
    Chen, Huajun
    Fan, Xiaohui
    NUCLEIC ACIDS RESEARCH, 2021, 49 (21) : E122
  • [13] SSMD: a semi-supervised approach for a robust cell type identification and deconvolution of mouse transcriptomics data
    Lu, Xiaoyu
    Tu, Szu-Wei
    Chang, Wennan
    Wan, Changlin
    Wang, Jiashi
    Zang, Yong
    Ramdas, Baskar
    Kapur, Reuben
    Lu, Xiongbin
    Cao, Sha
    Zhang, Chi
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (04)
  • [14] scPLAN: a hierarchical computational framework for single transcriptomics data annotation, integration and cell-type label refinement
    Guo, Qirui
    Yuan, Musu
    Zhang, Lei
    Deng, Minghua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [15] Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data
    Huang, Yixuan
    Zhang, Peng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [16] A Semi-supervised Deep Learning Method for Cervical Cell Classification
    Zhao, Siqi
    He, Yongjun
    Qin, Jian
    Wang, Zixuan
    ANALYTICAL CELLULAR PATHOLOGY, 2022, 2022
  • [17] NLSDeconv: an efficient cell-type deconvolution method for spatial transcriptomics data
    Chen, Yunlu
    Ruan, Feng
    Wang, Ji-Ping
    BIOINFORMATICS, 2025, 41 (01)
  • [18] Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning
    Deng, Yue
    Bao, Feng
    Dai, Qionghai
    Wu, Lani F.
    Altschuler, Steven J.
    NATURE METHODS, 2019, 16 (04) : 311 - +
  • [19] Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning
    Yue Deng
    Feng Bao
    Qionghai Dai
    Lani F. Wu
    Steven J. Altschuler
    Nature Methods, 2019, 16 : 311 - 314
  • [20] Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data (vol 2021)
    Huang, Yixuan
    Zhang, Peng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)