CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning

被引:7
|
作者
Seal, Dibyendu Bikash [1 ]
Das, Vivek [2 ]
De, Rajat K. [3 ]
机构
[1] Univ Calcutta, AK Choudhury Sch Informat Technol, JD 2,Sect 3, Kolkata 700106, India
[2] Novo Nordisk AS, Novo Nordisk Pk 1, DK-2760 Malov, Denmark
[3] Indian Stat Inst, Machine Intelligence Unit, 203 Barrackpore Trunk Rd, Kolkata 700108, India
关键词
scRNA-seq; Semi-supervised learning; NMF; k-means; RNA-SEQ DATA; DIMENSIONALITY REDUCTION; IDENTIFICATION; CLASSIFICATION; IMPUTATION; DYNAMICS;
D O I
10.1007/s10489-022-03440-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of individual cells with accurate, annotated labels. Presently, this task of cell-type annotation is done based on inspection of marker genes from each of the statistically significant groups of cells. This is both challenging and time consuming. In this article, we have proposed a semi-supervised cell-type annotation method, called CASSL, based on Non-negative matrix factorization (NMF) coupled with recursive k-means algorithm. A semi-supervised model is capable of learning labels for a large amount of unlabelled data with the help of a limited amount of labelled data. The effectiveness of CASSL has been demonstrated on eight publicly available human and mice scRNA-seq datasets across varied organs and protocols. It has been able to correctly annotate majority of the unlabelled cells with high accuracy. It has also been evaluated for its correctness of clustering solution, robustness across varying percentage of missing labels, and time taken for execution. When compared with state-of-the-art unsupervised and semi-supervised cell-type annotation methods, CASSL has consistently outperformed others across all metrics for most of the datasets. It has also shown competitive results when compared against state-of-the-art supervised methods.
引用
收藏
页码:1287 / 1305
页数:19
相关论文
共 50 条
  • [41] A semi-supervised learning method for remote sensing data mining
    Vatsavai, RR
    Shekhar, S
    Burk, TE
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 207 - 211
  • [42] A Semi-Supervised Learning Method Using Soft-Label for Cell Nuclei Segmentation On Immunohistochemistry Images
    Zhou, J.
    Yan, Z.
    Polf, J.
    Zhang, H.
    Zhang, B.
    MacFarlane, M.
    Han, D.
    Zakhary, M.
    Gopal, A.
    Xu, J.
    Lee, S.
    Xu, H.
    Lasio, G.
    Chen, S.
    MEDICAL PHYSICS, 2022, 49 (06) : E779 - E780
  • [43] scATAcat: cell-type annotation for scATAC-seq data
    Altay, Aybuge
    Vingron, Martin
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (04)
  • [44] TransCluster: A Cell-Type Identification Method for single-cell RNA-Seq data using deep learning based on transformer
    Song, Tao
    Dai, Huanhuan
    Wang, Shuang
    Wang, Gan
    Zhang, Xudong
    Zhang, Ying
    Jiao, Linfang
    FRONTIERS IN GENETICS, 2022, 13
  • [45] An Iterative Partitioning-Based Method for Semi-Supervised Annotation Learning in Image Collections
    Grzeszick, Rene
    Fink, Gernot A.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (02)
  • [46] AdImpute: An Imputation Method for Single-Cell RNA-Seq Data Based on Semi-Supervised Autoencoders
    Xu, Li
    Xu, Yin
    Xue, Tong
    Zhang, Xinyu
    Li, Jin
    FRONTIERS IN GENETICS, 2021, 12
  • [47] A robust semi-supervised NMF model for single cell RNA-seq data
    Wu, Peng
    An, Mo
    Zou, Hai-Ren
    Zhong, Cai-Ying
    Wang, Wei
    Wu, Chang-Peng
    PEERJ, 2020, 8
  • [48] scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data
    Zhai, Yuyao
    Chen, Liang
    Deng, Minghua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (02)
  • [49] Learning And Predicting Diabetes Data Sets Using Semi-Supervised Learning
    Tayal, Radhika
    Shankar, Achyut
    PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 385 - 389
  • [50] Single-cell transcriptomics revealed cell-type specific responses to fungal infection
    Tang, B.
    Ding, P.
    Feng, L.
    Ma, W.
    MOLECULAR PLANT-MICROBE INTERACTIONS, 2024, 37 (05) : 133 - 133