Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

被引:6
|
作者
Shainer, Inbal [1 ]
Stemmer, Manuel [1 ]
机构
[1] Max Planck Inst Neurobiol, Klopferspitz 18, D-82152 Martinsried, Germany
关键词
Single-cell RNA sequencing; Cell Ranger; Kallisto; Zebrafish; Pineal gland; Alignment; Opsin; 10X genomics; CELL RNA-SEQ; QUANTIFICATION; READS; STAR;
D O I
10.1186/s12864-021-07930-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. Results In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. Conclusion While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] scMRA: a robust deep learning method to annotate scRNA-seq data with multiple reference datasets
    Yuan, Musu
    Chen, Liang
    Deng, Minghua
    BIOINFORMATICS, 2022, 38 (03) : 738 - 745
  • [42] scCAT: Single-cell Combined graph Attentional clustering for scRNA-seq analysis
    Gao, Yufei
    Zhang, Wenbo
    Zhang, Yameng
    Shi, Yujie
    Shi, Lei
    Wang, Hailing
    Cheng, Guozhen
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 105
  • [43] Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering
    Fang, Jiyuan
    Chan, Cliburn
    Owzar, Kouros
    Wang, Liuyang
    Qin, Diyuan
    Li, Qi-Jing
    Xie, Jichun
    GENOME BIOLOGY, 2022, 23 (01)
  • [44] Clustering scRNA-seq data with the cross-view collaborative information fusion strategy
    Lou, Zhengzheng
    Wei, Xiaojiao
    Hu, Yuanhao
    Hu, Shizhe
    Wu, Yucong
    Tian, Zhen
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [45] Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering
    Jiyuan Fang
    Cliburn Chan
    Kouros Owzar
    Liuyang Wang
    Diyuan Qin
    Qi-Jing Li
    Jichun Xie
    Genome Biology, 23
  • [46] Attention-based deep clustering method for scRNA-seq cell type identification
    Li, Shenghao
    Guo, Hui
    Zhang, Simai
    Li, Yizhou
    Li, Menglong
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (11)
  • [47] RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest
    Zhao, Yuan
    Fang, Zhao-Yu
    Lin, Cui-Xiang
    Deng, Chao
    Xu, Yun-Pei
    Li, Hong-Dong
    FRONTIERS IN GENETICS, 2021, 12
  • [48] Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering
    Shu, Zhenqiu
    Long, Qinghan
    Zhang, Luping
    Yu, Zhengtao
    Wu, Xiao-Jun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (23) : 6271 - 6286
  • [49] A framework for scRNA-seq data clustering based on multi-view feature integration
    Li, Feng
    Liu, Yang
    Liu, Jinxing
    Ge, Daohui
    Shang, Junliang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [50] A Streamlined scRNA-Seq Data Analysis Framework Based on Improved Sparse Subspace Clustering
    Zhuang, Jujuan
    Cui, Lingyu
    Qu, Tianqi
    Ren, Changjing
    Xu, Junlin
    Li, Tianbao
    Tian, Geng
    Yang, Jialiang
    IEEE ACCESS, 2021, 9 : 9719 - 9727