Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

被引:6
|
作者
Shainer, Inbal [1 ]
Stemmer, Manuel [1 ]
机构
[1] Max Planck Inst Neurobiol, Klopferspitz 18, D-82152 Martinsried, Germany
关键词
Single-cell RNA sequencing; Cell Ranger; Kallisto; Zebrafish; Pineal gland; Alignment; Opsin; 10X genomics; CELL RNA-SEQ; QUANTIFICATION; READS; STAR;
D O I
10.1186/s12864-021-07930-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. Results In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. Conclusion While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets
    Inbal Shainer
    Manuel Stemmer
    BMC Genomics, 22
  • [2] Recursive Clustering of Cellular Diversity in scRNA-Seq Data
    Squires, Michael
    Qiu, Peng
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2025,
  • [3] Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis
    Kubovciak, Jan
    Kolar, Michal
    Novotny, Jiri
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [4] Contrastive self-supervised clustering of scRNA-seq data
    Ciortan, Madalina
    Defrance, Matthieu
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [5] Using DenseFly algorithm for cell searching on massive scRNA-seq datasets
    Yixin Chen
    Sijie Chen
    Xuegong Zhang
    BMC Genomics, 21
  • [6] A subspace clustering method for satisfying stoimetric constraints in scRNA-seq
    Huang, Angela
    Kim, Junhyong
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (IEEE BIBE 2021), 2021,
  • [7] GNN-based embedding for clustering scRNA-seq data
    Ciortan, Madalina
    Defrance, Matthieu
    BIOINFORMATICS, 2022, 38 (04) : 1037 - 1044
  • [8] scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data
    Wang, HaiYun
    Zhao, JianPing
    Zheng, ChunHou
    Su, YanSen
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [9] Using DenseFly algorithm for cell searching on massive scRNA-seq datasets
    Chen, Yixin
    Chen, Sijie
    Zhang, Xuegong
    BMC GENOMICS, 2020, 21 (Suppl 5)
  • [10] Contrastive self-supervised clustering of scRNA-seq data
    Madalina Ciortan
    Matthieu Defrance
    BMC Bioinformatics, 22