Choice of pre-processing pipeline influences clustering quality of scRNA-seq datasets

被引:6
|
作者
Shainer, Inbal [1 ]
Stemmer, Manuel [1 ]
机构
[1] Max Planck Inst Neurobiol, Klopferspitz 18, D-82152 Martinsried, Germany
关键词
Single-cell RNA sequencing; Cell Ranger; Kallisto; Zebrafish; Pineal gland; Alignment; Opsin; 10X genomics; CELL RNA-SEQ; QUANTIFICATION; READS; STAR;
D O I
10.1186/s12864-021-07930-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Single-cell RNA sequencing (scRNA-seq) has quickly become one of the most dominant techniques in modern transcriptome assessment. In particular, 10X Genomics' Chromium system, with its high throughput approach, turn key and thorough user guide made this cutting-edge technique accessible to many laboratories using diverse animal models. However, standard pre-processing, including the alignment and cell filtering pipelines might not be ideal for every organism or tissue. Here we applied an alternative strategy, based on the pseudoaligner kallisto, on twenty-two publicly available single cell sequencing datasets from a wide range of tissues of eight organisms and compared the results with the standard 10X Genomics' Cell Ranger pipeline. Results In most of the tested samples, kallisto produced higher sequencing read alignment rates and total gene detection rates in comparison to Cell Ranger. Although datasets processed with Cell Ranger had higher cell counts, outside of human and mouse datasets, these additional cells were routinely of low quality, containing low gene detection rates. Thorough downstream analysis of one kallisto processed dataset, obtained from the zebrafish pineal gland, revealed clearer clustering, allowing the identification of an additional photoreceptor cell type that previously went undetected. The finding of the new cluster suggests that the photoreceptive pineal gland is essentially a bi-chromatic tissue containing both green and red cone-like photoreceptors and implies that the alignment and pre-processing pipeline can affect the discovery of biologically-relevant cell types. Conclusion While Cell Ranger favors higher cell numbers, using kallisto results in datasets with higher median gene detection per cell. We could demonstrate that cell type identification was not hampered by the lower cell count, but in fact improved as a result of the high gene detection rate and the more stringent filtering. Depending on the acquired dataset, it can be beneficial to favor high quality cells and accept a lower cell count, leading to an improved classification of cell types.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] FSCAM: CAM-Based Feature Selection for Clustering scRNA-seq
    Yan Wang
    Jie Gao
    Chenxu Xuan
    Tianhao Guan
    Yujie Wang
    Gang Zhou
    Tao Ding
    Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 394 - 408
  • [22] A Deep Learning Pipeline for the Automatic cell type Assignment of scRNA-seq Data
    Riva, Simone G.
    Myers, Brynelle
    Cazzaniga, Paolo
    Tangherloni, Andrea
    2022 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (IEEE CIBCB 2022), 2022, : 67 - 74
  • [23] MLG: multilayer graph clustering for multi-condition scRNA-seq data
    Lu, Shan
    Conn, Daniel J.
    Chen, Shuyang
    Johnson, Kirby D.
    Bresnick, Emery H.
    Keles, Sunduz
    NUCLEIC ACIDS RESEARCH, 2021, 49 (22) : E127
  • [24] scSDSC: Self-supervised Deep Subspace Clustering for scRNA-seq Data
    Yang, Bo
    Wang, Hai-Yun
    Zhao, Jian-Ping
    Zheng, Chun-Hou
    CURRENT BIOINFORMATICS, 2024,
  • [25] A clustering method for small scRNA-seq data based on subspace and weighted distance
    Ning, Zilan
    Dai, Zhijun
    Zhang, Hongyan
    Chen, Yuan
    Yuan, Zheming
    PEERJ, 2023, 11 : 28 - 28
  • [26] Automatically Detecting Anchor Cells and Clustering for scRNA-Seq Data Using scTSNN
    Liu, Qiaoming
    Zhang, Dandan
    Wang, Dong
    Wang, Guohua
    Wang, Yadong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 7015 - 7027
  • [27] scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data
    Wan, Hui
    Chen, Liang
    Deng, Minghua
    BIOINFORMATICS, 2022, 38 (06) : 1575 - 1583
  • [28] scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets
    Liu, Hongyu
    Prashant, N. M.
    Spurr, Liam F.
    Bousounis, Pavlos
    Alomran, Nawaf
    Ibeawuchi, Helen
    Sein, Justin
    Slowinski, Piotr
    Tsaneva-Atanasova, Krasimira
    Horvath, Anelia
    BMC GENOMICS, 2021, 22 (01)
  • [29] Synthetic DNA barcodes identify singlets in scRNA-seq datasets and evaluate doublet algorithms
    Zhang, Ziyang
    Melzer, Madeline E.
    Arun, Keerthana M.
    Sun, Hanxiao
    Eriksson, Carl-Johan
    Fabian, Itai
    Shaashua, Sagi
    Kiani, Karun
    Oren, Yaara
    Goyal, Yogesh
    CELL GENOMICS, 2024, 4 (07):
  • [30] scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets
    Hongyu Liu
    N. M. Prashant
    Liam F. Spurr
    Pavlos Bousounis
    Nawaf Alomran
    Helen Ibeawuchi
    Justin Sein
    Piotr Słowiński
    Krasimira Tsaneva-Atanasova
    Anelia Horvath
    BMC Genomics, 22