Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters

被引:12
|
作者
Xia, Lucy [1 ]
Lee, Christy [2 ]
Li, Jingyi Jessica [2 ,3 ,4 ,5 ,6 ]
机构
[1] Hong Kong Univ Sci & Technol, Sch Business & Management, Dept ISOM, Clear Water Bay, Hong Kong, Peoples R China
[2] Univ Calif Los Angeles, Dept Stat & Data Sci, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Biostat, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[6] Harvard Univ, Radcliffe Inst Adv Study, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
D O I
10.1038/s41467-024-45891-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Two-dimensional (2D) embedding methods are crucial for single-cell data visualization. Popular methods such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are commonly used for visualizing cell clusters; however, it is well known that t-SNE and UMAP's 2D embeddings might not reliably inform the similarities among cell clusters. Motivated by this challenge, we present a statistical method, scDEED, for detecting dubious cell embeddings output by a 2D-embedding method. By calculating a reliability score for every cell embedding based on the similarity between the cell's 2D-embedding neighbors and pre-embedding neighbors, scDEED identifies the cell embeddings with low reliability scores as dubious and those with high reliability scores as trustworthy. Moreover, by minimizing the number of dubious cell embeddings, scDEED provides intuitive guidance for optimizing the hyperparameters of an embedding method. We show the effectiveness of scDEED on multiple datasets for detecting dubious cell embeddings and optimizing the hyperparameters of t-SNE and UMAP. 2D visualisation of single-cell data is highly impacted by the hyperparameter setting of the 2D embedding method, such as t-SNE and UMAP. Here, authors develop a statistical method scDEED to detect dubious cell embeddings and optimise the hyperparameter setting for trustworthy visualisation.
引用
收藏
页数:21
相关论文
共 4 条
  • [1] Statistical method scDEED for detecting dubious 2D single-cell embeddings and optimizing t-SNE and UMAP hyperparameters
    Lucy Xia
    Christy Lee
    Jingyi Jessica Li
    Nature Communications, 15
  • [2] A generalization of t-SNE and UMAP to single-cell multimodal omics
    Van Hoan Do
    Canzar, Stefan
    GENOME BIOLOGY, 2021, 22 (01)
  • [3] A generalization of t-SNE and UMAP to single-cell multimodal omics
    Van Hoan Do
    Stefan Canzar
    Genome Biology, 22
  • [4] The art of seeing the elephant in the room: 2D embeddings of single-cell data do make sense
    Lause, Jan
    Berens, Philipp
    Kobak, Dmitry
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (10)