CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning

被引:7
|
作者
Seal, Dibyendu Bikash [1 ]
Das, Vivek [2 ]
De, Rajat K. [3 ]
机构
[1] Univ Calcutta, AK Choudhury Sch Informat Technol, JD 2,Sect 3, Kolkata 700106, India
[2] Novo Nordisk AS, Novo Nordisk Pk 1, DK-2760 Malov, Denmark
[3] Indian Stat Inst, Machine Intelligence Unit, 203 Barrackpore Trunk Rd, Kolkata 700108, India
关键词
scRNA-seq; Semi-supervised learning; NMF; k-means; RNA-SEQ DATA; DIMENSIONALITY REDUCTION; IDENTIFICATION; CLASSIFICATION; IMPUTATION; DYNAMICS;
D O I
10.1007/s10489-022-03440-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of individual cells with accurate, annotated labels. Presently, this task of cell-type annotation is done based on inspection of marker genes from each of the statistically significant groups of cells. This is both challenging and time consuming. In this article, we have proposed a semi-supervised cell-type annotation method, called CASSL, based on Non-negative matrix factorization (NMF) coupled with recursive k-means algorithm. A semi-supervised model is capable of learning labels for a large amount of unlabelled data with the help of a limited amount of labelled data. The effectiveness of CASSL has been demonstrated on eight publicly available human and mice scRNA-seq datasets across varied organs and protocols. It has been able to correctly annotate majority of the unlabelled cells with high accuracy. It has also been evaluated for its correctness of clustering solution, robustness across varying percentage of missing labels, and time taken for execution. When compared with state-of-the-art unsupervised and semi-supervised cell-type annotation methods, CASSL has consistently outperformed others across all metrics for most of the datasets. It has also shown competitive results when compared against state-of-the-art supervised methods.
引用
收藏
页码:1287 / 1305
页数:19
相关论文
共 50 条
  • [21] Multimodal Single-Cell Translation and Alignment with Semi-Supervised Learning
    Zhang, Ran
    Meng-Papaxanthos, Laetitia
    Vert, Jean-Philippe
    Noble, William Stafford
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (11) : 1198 - 1212
  • [22] A Semi-supervised Machine Learning Method for Chinese Patent Effect Annotation
    Chen, Xu
    Deng, Na
    2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, 2015, : 243 - 250
  • [23] Cell-type annotation with accurate unseen cell-type identification using multiple references
    Xiong, Yi-Xuan
    Wang, Meng-Guo
    Chen, Luonan
    Zhang, Xiao-Fei
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (06)
  • [24] scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning
    Jia, Shangru
    Lysenko, Artem
    Boroevich, Keith A.
    Sharma, Alok
    Tsunoda, Tatsuhiko
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
  • [25] scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data
    Alquicira-Hernandez, Jose
    Sathe, Anuja
    Ji, Hanlee P.
    Quan Nguyen
    Powell, Joseph E.
    GENOME BIOLOGY, 2019, 20 (01)
  • [26] scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data
    Jose Alquicira-Hernandez
    Anuja Sathe
    Hanlee P. Ji
    Quan Nguyen
    Joseph E. Powell
    Genome Biology, 20
  • [27] A Semi-Supervised Classification Method of Apicomplexan Parasites and Host Cell using Contrastive Learning Strategy
    Ren, Yanni
    Deng, Hangyu
    Jiang, Hao
    Zhu, Huilin
    Hu, Jinglu
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2973 - 2978
  • [28] A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs
    Zihuai He
    Linxi Liu
    Kai Wang
    Iuliana Ionita-Laza
    Nature Communications, 9
  • [29] A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs
    He, Zihuai
    Liu, Linxi
    Wang, Kai
    Ionita-Laza, Iuliana
    NATURE COMMUNICATIONS, 2018, 9
  • [30] A New Graph Semi-Supervised Learning Method for Medical Image Automatic Annotation
    Bi, Jing
    Yin, Shoulin
    IEEE 2018 INTERNATIONAL CONGRESS ON CYBERMATICS / 2018 IEEE CONFERENCES ON INTERNET OF THINGS, GREEN COMPUTING AND COMMUNICATIONS, CYBER, PHYSICAL AND SOCIAL COMPUTING, SMART DATA, BLOCKCHAIN, COMPUTER AND INFORMATION TECHNOLOGY, 2018, : 43 - 46