SCHNEL: scalable clustering of high dimensional single-cell data

被引:3
|
作者
Abdelaal, Tamim [1 ,2 ]
de Raadt, Paul [2 ]
Lelieveldt, Boudewijn P. F. [1 ,2 ]
Reinders, Marcel J. T. [1 ,2 ,3 ]
Mahfouz, Ahmed [1 ,2 ,3 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Leiden Univ, Med Ctr, Leiden Computat Biol Ctr, NL-2333 ZC Leiden, Netherlands
[3] Leiden Univ, Med Ctr, Dept Human Genet, NL-2333 ZC Leiden, Netherlands
基金
欧盟地平线“2020”;
关键词
FLOW-CYTOMETRY; MASS CYTOMETRY; POPULATIONS;
D O I
10.1093/bioinformatics/btaa816
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results: We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.
引用
收藏
页码:I849 / I856
页数:8
相关论文
共 50 条
  • [31] Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks
    Li, Ye Henry
    Li, Dangna
    Samusik, Nikolay
    Wang, Xiaowei
    Guan, Leying
    Nolan, Garry P.
    Wong, Wing Hung
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (12)
  • [32] Generalizable and Scalable Visualization of Single-Cell Data Using Neural Networks
    Cho, Hyunghoon
    Berger, Bonnie
    Peng, Jian
    CELL SYSTEMS, 2018, 7 (02) : 185 - +
  • [33] HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis
    Anibal, James
    Day, Alexandre G.
    Bahadiroglu, Erol
    O'Neil, Liam
    Long Phan
    Peltekian, Alec
    Erez, Amir
    Kaplan, Mariana
    Altan-Bonnet, Gregoire
    Mehta, Pankaj
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (10)
  • [34] p-clustval: a novel p-adic approach for enhanced clustering of high-dimensional single-cell RNASeq data
    Sharma, Parichit
    Mishra, Sarthak
    Kurban, Hasan
    Dalkilic, Mehmet
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2025,
  • [35] Topological Methods for Visualization and Analysis of High Dimensional Single-Cell RNA Sequencing Data
    Wang, Tongxin
    Johnson, Travis
    Zhang, Jie
    Huang, Kun
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019, 2019, : 350 - 361
  • [36] High-Dimensional Single-Cell Cancer Biology
    Irish, Jonathan M.
    Doxie, Deon B.
    HIGH-DIMENSIONAL SINGLE CELL ANALYSIS: MASS CYTOMETRY, MULTI-PARAMETRIC FLOW CYTOMETRY AND BIOINFORMATIC TECHNIQUES, 2014, 377 : 1 - 21
  • [37] Synchronization-based scalable subspace clustering of high-dimensional data
    Shao, Junming
    Wang, Xinzuo
    Yang, Qinli
    Plant, Claudia
    Boehm, Christian
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 52 (01) : 83 - 111
  • [38] O-cluster: Scalable clustering of large high dimensional data sets
    Milenova, BL
    Campos, MM
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 290 - 297
  • [39] VPAC: Variational projection for accurate clustering of single-cell transcriptomic data
    Shengquan Chen
    Kui Hua
    Hongfei Cui
    Rui Jiang
    BMC Bioinformatics, 20
  • [40] Simultaneous deep generative modelling and clustering of single-cell genomic data
    Qiao Liu
    Shengquan Chen
    Rui Jiang
    Wing Hung Wong
    Nature Machine Intelligence, 2021, 3 : 536 - 544