Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

被引:6
|
作者
Shainer, Inbal [1 ]
Stemmer, Manuel [1 ]
机构
[1] Max Planck Inst Neurobiol, Klopferspitz 18, D-82152 Martinsried, Germany
关键词
Single-cell RNA sequencing; Cell Ranger; Kallisto; Zebrafish; Pineal gland; Alignment; Opsin; 10X genomics; CELL RNA-SEQ; QUANTIFICATION; READS; STAR;
D O I
10.1186/s12864-021-07930-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. Results In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. Conclusion While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] scSFCL:Deep clustering of scRNA-seq data with subspace feature confidence learning
    Meng, Xiaokun
    Zhang, Yuanyuan
    Xu, Xiaoyu
    Zhang, Kaihao
    Feng, Baoming
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2025, 114
  • [32] Effective Clustering of scRNA-seq Data to Identify Biomarkers without User Input
    Chowdhury, Hussain A.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15710 - 15711
  • [33] Boosting scRNA-seq data clustering by cluster-aware feature weighting
    Rui-Yi Li
    Jihong Guan
    Shuigeng Zhou
    BMC Bioinformatics, 22
  • [34] scGCC: Graph Contrastive Clustering With Neighborhood Augmentations for scRNA-Seq Data Analysis
    Tian, Sheng-Wen
    Ni, Jian-Cheng
    Wang, Yu-Tian
    Zheng, Chun-Hou
    Ji, Cun-Mei
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (12) : 6133 - 6143
  • [35] UIPBC: An effective clustering for scRNA-seq data analysis without user input
    Chowdhury, Hussain Ahmed
    Bhattacharyya, Dhruba Kumar
    Kalita, Jugal Kumar
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [36] Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data
    Gan, Yanglan
    Chen, Yuhan
    Xu, Guangwei
    Guo, Wenjing
    Zou, Guobing
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [37] Boosting scRNA-seq data clustering by cluster-aware feature weighting
    Li, Rui-Yi
    Guan, Jihong
    Zhou, Shuigeng
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 6)
  • [38] AAE-SC: A scRNA-Seq Clustering Framework Based on Adversarial Autoencoder
    Wu, Yulun
    Guo, Yanming
    Xiao, Yandong
    Lao, Songyang
    IEEE ACCESS, 2020, 8 : 178962 - 178975
  • [39] Protocol for optimized nasal mucosa sample processing to obtain high-quality scRNA-seq and scATAC-seq data
    Huang, Yaling
    Wu, Yisha
    Han, Shikai
    Wang, Qiaoling
    Cong, Guomingxiu
    Liu, Zhongzhen
    Guan, Shuyan
    Huang, Xiaojuan
    Liu, Ying
    Yin, Jianhua
    Xue, Jinmei
    Liu, Chuanyu
    STAR PROTOCOLS, 2024, 5 (03):
  • [40] Pre-processing for data clustering
    Frigui, H
    NAFIPS 2004: ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, VOLS 1AND 2: FUZZY SETS IN THE HEART OF THE CANADIAN ROCKIES, 2004, : 967 - 972