scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets

被引:23
|
作者
Yuan, Musu [1 ,2 ]
Chen, Liang [1 ]
Deng, Minghua [1 ,2 ,3 ]
机构
[1] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[2] Peking Univ, Ctr Quantitat Biol, Beijing 100871, Peoples R China
[3] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
CELL RNA-SEQ;
D O I
10.1093/bioinformatics/btab700
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high-quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets. Results: Herein, a robust deep learning-based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-theart annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets.
引用
收藏
页码:738 / 745
页数:8
相关论文
共 50 条
  • [41] Dual-GCN-based deep clustering with triplet contrast for ScRNA-seq data analysis?
    Wang, Linjie
    Li, Wei
    Xie, Weidong
    Wang, Rui
    Yu, Kun
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 106
  • [42] A rank-based marker selection method for high throughput scRNA-seq data
    Vargo, Alexander H. S.
    Gilbert, Anna C.
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [43] NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data
    Ma, Yupeng
    Pei, Yongzhen
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2024, 22 (03)
  • [44] Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering
    Fang, Jiyuan
    Chan, Cliburn
    Owzar, Kouros
    Wang, Liuyang
    Qin, Diyuan
    Li, Qi-Jing
    Xie, Jichun
    GENOME BIOLOGY, 2022, 23 (01)
  • [45] Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering
    Jiyuan Fang
    Cliburn Chan
    Kouros Owzar
    Liuyang Wang
    Diyuan Qin
    Qi-Jing Li
    Jichun Xie
    Genome Biology, 23
  • [46] Comprehensive analyses of brain cell communications based on multiple scRNA-seq and snRNA-seq datasets for revealing novel mechanism in neurodegenerative diseases
    Zhang, Chunlong
    Tan, Guiyuan
    Zhang, Yuxi
    Zhong, Xiaoling
    Zhao, Ziyan
    Peng, Yunyi
    Cheng, Qian
    Xue, Ke
    Xu, Yanjun
    Li, Xia
    Li, Feng
    Zhang, Yunpeng
    CNS NEUROSCIENCE & THERAPEUTICS, 2023, 29 (10) : 2775 - 2786
  • [47] Robust analysis of allele-specific copy number alterations from scRNA-seq data with XClone
    Huang, Rongting
    Huang, Xianjie
    Tong, Yin
    Yan, Helen Y. N.
    Leung, Suet Yi
    Stegle, Oliver
    Huang, Yuanhua
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [48] scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously
    Ziqi Zhang
    Chengkai Yang
    Xiuwei Zhang
    Genome Biology, 23
  • [49] scDART: integrating unmatched scRNA-seq and scATAC-seq data and learning cross-modality relationship simultaneously
    Zhang, Ziqi
    Yang, Chengkai
    Zhang, Xiuwei
    GENOME BIOLOGY, 2022, 23 (01)
  • [50] Deep zero-inflated negative binomial model and its application in scRNA-seq data integration
    Wei, Mingqiu
    Liu, Rongjie
    Wang, Yue Julia
    Huang, Chao
    SOUTHEASTCON 2023, 2023, : 901 - 905