SCHNEL: scalable clustering of high dimensional single-cell data

被引:3
|
作者
Abdelaal, Tamim [1 ,2 ]
de Raadt, Paul [2 ]
Lelieveldt, Boudewijn P. F. [1 ,2 ]
Reinders, Marcel J. T. [1 ,2 ,3 ]
Mahfouz, Ahmed [1 ,2 ,3 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Leiden Univ, Med Ctr, Leiden Computat Biol Ctr, NL-2333 ZC Leiden, Netherlands
[3] Leiden Univ, Med Ctr, Dept Human Genet, NL-2333 ZC Leiden, Netherlands
基金
欧盟地平线“2020”;
关键词
FLOW-CYTOMETRY; MASS CYTOMETRY; POPULATIONS;
D O I
10.1093/bioinformatics/btaa816
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results: We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.
引用
收藏
页码:I849 / I856
页数:8
相关论文
共 50 条
  • [41] Single-Cell RNA Sequencing Data Interpretation by Evolutionary Multiobjective Clustering
    Li, Xiangtao
    Wong, Ka-Chun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (05) : 1773 - 1784
  • [42] Synchronization-based scalable subspace clustering of high-dimensional data
    Junming Shao
    Xinzuo Wang
    Qinli Yang
    Claudia Plant
    Christian Böhm
    Knowledge and Information Systems, 2017, 52 : 83 - 111
  • [43] Significance analysis for clustering with single-cell RNA-sequencing data
    Isabella N. Grabski
    Kelly Street
    Rafael A. Irizarry
    Nature Methods, 2023, 20 : 1196 - 1202
  • [44] VPAC: Variational projection for accurate clustering of single-cell transcriptomic data
    Chen, Shengquan
    Hua, Kui
    Cui, Hongfei
    Jiang, Rui
    BMC BIOINFORMATICS, 2019, 20 (Suppl 7)
  • [45] Clustering Single-Cell Expression Data Using Random Forest Graphs
    Pouyan, Maziyar Baran
    Nourani, Mehrdad
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (04) : 1172 - 1181
  • [46] Ultrafast clustering of single-cell flow cytometry data using FlowGrid
    Ye, Xiaoxin
    Ho, Joshua W. K.
    BMC SYSTEMS BIOLOGY, 2019, 13
  • [47] A novel algorithm for fast and scalable subspace clustering of high-dimensional data
    Kaur A.
    Datta A.
    Journal of Big Data, 2015, 2 (01)
  • [48] Significance analysis for clustering with single-cell RNA-sequencing data
    Grabski, Isabella N.
    Street, Kelly
    Irizarry, Rafael A.
    NATURE METHODS, 2023, 20 (08) : 1196 - +
  • [49] bmVAE: a variational autoencoder method for clustering single-cell mutation data
    Yan, Jiaqian
    Ma, Ming
    Yu, Zhenhua
    BIOINFORMATICS, 2023, 39 (01)
  • [50] OmniClust: A versatile clustering toolkit for single-cell and spatial transcriptomics data
    Cui, Yaxuan
    Cui, Yang
    Ding, Yi
    Nakai, Kenta
    Wei, Leyi
    Le, Yuyin
    Ye, Xiucai
    Sakurai, Tetsuya
    METHODS, 2025, 238 : 84 - 94