A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics

被引:39
|
作者
Lakkis, Justin [1 ]
Wang, David [2 ]
Zhang, Yuanchao [1 ]
Hu, Gang [3 ]
Wang, Kui [4 ,5 ]
Pan, Huize [6 ]
Ungar, Lyle [7 ]
Reilly, Muredach P. [6 ]
Li, Xiangjie [3 ]
Li, Mingyao [1 ]
机构
[1] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Grad Grp Genom & Computat Biol, Philadelphia, PA 19104 USA
[3] Nankai Univ, Sch Stat & Data Sci, Key Lab Med Data Anal & Stat Res Tianjin, Tianjin 300071, Peoples R China
[4] Nankai Univ, Sch Math Sci, Dept Informat Theory & Data Sci, Tianjin 300071, Peoples R China
[5] Nankai Univ, LPMC, Tianjin 300071, Peoples R China
[6] Columbia Univ, Irving Med Ctr, Dept Med, Div Cardiol, New York, NY 10032 USA
[7] Univ Penn, Sch Engn & Appl Sci, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
D O I
10.1101/gr.271874.120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent developments of single-cell RNA-seq (scRNA-seq) technologies have led to enormous biological discoveries. As the scale of scRNA-seq studies increases, a major challenge in analysis is batch effects, which are inevitable in studies involving human tissues. Most existing methods remove batch effects in a low-dimensional embedding space. Although useful for clustering, batch effects are still present in the gene expression space, leaving downstream gene-level analysis susceptible to batch effects. Recent studies have shown that batch effect correction in the gene expression space is much harder than in the embedding space. Methods such as Seurat 3.0 rely on the mutual nearest neighbor (MNN) approach to remove batch effects in gene expression, but MNN can only analyze two batches at a time, and it becomes computationally infeasible when the number of batches is large. Here, we present CarDEC, a joint deep learning model that simultaneously clusters and denoises scRNA-seq data while correcting batch effects both in the embedding and the gene expression space. Comprehensive evaluations spanning different species and tissues showed that CarDEC outperforms Scanorama, DCA + Combat, scVI, and MNN. With CarDEC denoising, non-highly variable genes offer as much signal for clustering as the highly variable genes (HVGs), suggesting that CarDEC substantially boosted information content in scRNA-seq. We also showed that trajectory analysis using CarDEC's denoised and batch-corrected expression as input revealed marker genes and transcription factors that are otherwise obscured in the presence of batch effects. CarDEC is computationally fast, making it a desirable tool for large-scale scRNA-seq studies.
引用
收藏
页码:1753 / 1766
页数:14
相关论文
共 50 条
  • [1] Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis
    Li, Xiangjie
    Wang, Kui
    Lyu, Yafei
    Pan, Huize
    Zhang, Jingxiao
    Stambolian, Dwight
    Susztak, Katalin
    Reilly, Muredach P.
    Hu, Gang
    Li, Mingyao
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [2] Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis
    Xiangjie Li
    Kui Wang
    Yafei Lyu
    Huize Pan
    Jingxiao Zhang
    Dwight Stambolian
    Katalin Susztak
    Muredach P. Reilly
    Gang Hu
    Mingyao Li
    Nature Communications, 11
  • [3] Data denoising with transfer learning in single-cell transcriptomics
    Jingshu Wang
    Divyansh Agarwal
    Mo Huang
    Gang Hu
    Zilu Zhou
    Chengzhong Ye
    Nancy R. Zhang
    Nature Methods, 2019, 16 : 875 - 878
  • [4] Data denoising with transfer learning in single-cell transcriptomics
    Wang, Jingshu
    Agarwal, Divyansh
    Huang, Mo
    Hu, Gang
    Zhou, Zilu
    Ye, Chengzhong
    Zhang, Nancy R.
    NATURE METHODS, 2019, 16 (09) : 875 - +
  • [5] Batch alignment of single-cell transcriptomics data using deep metric learning
    Yu, Xiaokang
    Xu, Xinyi
    Zhang, Jingxiao
    Li, Xiangjie
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [6] Batch alignment of single-cell transcriptomics data using deep metric learning
    Xiaokang Yu
    Xinyi Xu
    Jingxiao Zhang
    Xiangjie Li
    Nature Communications, 14
  • [7] A Joint Batch Correction and Adaptive Clustering Method of Single-Cell Transcriptomic Data
    An, Sijing
    Shi, Jinhui
    Liu, Runyan
    Wang, Jing
    Hu, Shuofeng
    Dong, Guohua
    Ying, Xiaomin
    He, Zhen
    MATHEMATICS, 2023, 11 (24)
  • [8] CAbiNet: joint clustering and visualization of cells and genes for single-cell transcriptomics
    Zhao, Yan
    Kohl, Clemens
    Rosebrock, Daniel
    Hu, Qinan
    Hu, Yuhui
    Vingron, Martin
    NUCLEIC ACIDS RESEARCH, 2024, 52 (13)
  • [9] scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics
    Dong, Jiayi
    Zhang, Yin
    Wang, Fei
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [10] scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics
    Jiayi Dong
    Yin Zhang
    Fei Wang
    BMC Bioinformatics, 23