SCHNEL: scalable clustering of high dimensional single-cell data

被引:3
|
作者
Abdelaal, Tamim [1 ,2 ]
de Raadt, Paul [2 ]
Lelieveldt, Boudewijn P. F. [1 ,2 ]
Reinders, Marcel J. T. [1 ,2 ,3 ]
Mahfouz, Ahmed [1 ,2 ,3 ]
机构
[1] Delft Univ Technol, Delft Bioinformat Lab, NL-2628 XE Delft, Netherlands
[2] Leiden Univ, Med Ctr, Leiden Computat Biol Ctr, NL-2333 ZC Leiden, Netherlands
[3] Leiden Univ, Med Ctr, Dept Human Genet, NL-2333 ZC Leiden, Netherlands
基金
欧盟地平线“2020”;
关键词
FLOW-CYTOMETRY; MASS CYTOMETRY; POPULATIONS;
D O I
10.1093/bioinformatics/btaa816
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results: We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST.
引用
收藏
页码:I849 / I856
页数:8
相关论文
共 50 条
  • [1] SCALABLE VISUALIZATION FOR HIGH-DIMENSIONAL SINGLE-CELL DATA
    Kim, Juho
    Russell, Nate
    Peng, Jian
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017, 2017, : 623 - 634
  • [2] CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis
    Li, Yijia
    Nguyen, Jonathan
    Anastasiu, David C.
    Arriaga, Edgar A.
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [3] Scalable clustering of high dimensional data
    Littau, D
    Boley, D
    BETWEEN DATA SCIENCE AND APPLIED DATA ANALYSIS, 2003, : 57 - 64
  • [4] Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data
    Weber, Lukas M.
    Robinson, Mark D.
    CYTOMETRY PART A, 2016, 89A (12) : 1084 - 1096
  • [5] Single-cell regulatory network inference and clustering from high-dimensional sequencing data
    Vrahatis, Aristidis G.
    Dimitrakopoulos, Georgios N.
    Tasoulis, Sotiris K.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2782 - 2789
  • [6] Semisoft clustering of single-cell data
    Zhu, Lingxue
    Lei, Jing
    Klei, Lambertus
    Devlin, Bernie
    Roeder, Kathryn
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (02) : 466 - 471
  • [8] Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data
    Wei, Nana
    Nie, Yating
    Liu, Lin
    Zheng, Xiaoqi
    Wu, Hua-Jun
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [9] DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data
    Bobby Ranjan
    Wenjie Sun
    Jinyu Park
    Kunal Mishra
    Florian Schmidt
    Ronald Xie
    Fatemeh Alipour
    Vipul Singhal
    Ignasius Joanito
    Mohammad Amin Honardoost
    Jacy Mei Yun Yong
    Ee Tzun Koh
    Khai Pang Leong
    Nirmala Arul Rayan
    Michelle Gek Liang Lim
    Shyam Prabhakar
    Nature Communications, 12
  • [10] DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data
    Ranjan, Bobby
    Sun, Wenjie
    Park, Jinyu
    Mishra, Kunal
    Schmidt, Florian
    Xie, Ronald
    Alipour, Fatemeh
    Singhal, Vipul
    Joanito, Ignasius
    Honardoost, Mohammad Amin
    Yong, Jacy Mei Yun
    Koh, Ee Tzun
    Leong, Khai Pang
    Rayan, Nirmala Arul
    Lim, Michelle Gek Liang
    Prabhakar, Shyam
    NATURE COMMUNICATIONS, 2021, 12 (01)