Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data

被引:3
|
作者
Cristian, Padron-Manrique [1 ,2 ]
Aaron, Vazquez-Jimenez [1 ]
Armando, Esquivel-Hernandez Diego [1 ]
Estrella, Martinez-Lopez Yoscelina [1 ,3 ]
Daniel, Neri-Rosario [1 ,4 ]
David, Giron-Villalobos [1 ,4 ]
Edgar, Mixcoha [1 ,5 ]
Paul, Sanchez-Castaneda Jean [1 ,4 ]
Osbaldo, Resendis-Antonio [1 ,6 ,7 ]
机构
[1] Inst Nacl Med Genomica INMEGEN, Human Syst Biol Lab, Arenal Tepepan, Perifer 4809, Mexico City 14610, Mexico
[2] Univ Nacl Autonoma Mexico, Programa Doctorado Ciencias Biomed, Coyoacan Unidad Posgrad, Edificio primer Piso B,Ciudad Univ, Mexico City 04510, Mexico
[3] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Doctorado Ciencias Med, Ciudad Univ,Edificio A,1er Piso, Mexico City 04510, Mexico
[4] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Maestria Ciencias Bioquim, Ciudad Univ,Edificio B,1er Piso, Mexico City 04510, Mexico
[5] CONAHCYT INMEGEN, Periferico Sur 4809, Mexico City 14610, Mexico
[6] Inst Nacl Ciencias Med & Nutr Salvador Zubiran, Coordinac Invest Cient Red Apoyo Invest, Belisario Dominguez Seccion16, Mexico City 14080, Mexico
[7] Unvers Nacl Autonoma Mexico UNAM, Ctr Ciencias Complejidad, Circuito Ctr Cultural, Mexico City 04510, Mexico
来源
BIOLOGY-BASEL | 2024年 / 13卷 / 07期
关键词
manifold learning; UMAP; diffusion maps; scRNA-seq; imputation; denoising; high-dimensional data; ADHESION; HYPOXIA; DEATH;
D O I
10.3390/biology13070512
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary In scRNA-seq analysis, diffusion-based approaches help identify the connections between cells, allowing us to observe the progression of individual cells as they change phenotypes within a mathematical space known as a manifold. Recently, these approaches have been used as a reference for imputation, a technique that addresses missing data, a common challenge in scRNA-seq analysis. For example, MAGIC is a popular diffusion-based imputation method, and it has shown success in uncovering gene-gene interactions related to phenotypic transitions that would not be possible without imputation. However, previous evaluations have not adequately compared the impact of different parameter settings on MAGIC, particularly over-smoothing issues. To address this, we developed sc-PHENIX, which utilizes a similar diffusion approach as MAGIC but incorporates a PCA-UMAP initialization step, whereas MAGIC only uses PCA. We compared sc-PHENIX and MAGIC in terms of imputation accuracy, visualization, biological insights, and preservation of data structure. Our findings show that sc-PHENIX outperforms MAGIC across various common parameters such as "diffusion time" (t), the number of nearest neighbors (knn), and PCA dimensions. It effectively captures and preserves the global, local, and continuous data structures, leading to more reliable imputation and potentially uncovering new biological insights in diverse datasets.Abstract Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
引用
收藏
页数:43
相关论文
共 50 条
  • [1] Tools for the analysis of high-dimensional single-cell RNA sequencing data
    Wu, Yan
    Zhang, Kun
    NATURE REVIEWS NEPHROLOGY, 2020, 16 (07) : 408 - 421
  • [2] Tools for the analysis of high-dimensional single-cell RNA sequencing data
    Yan Wu
    Kun Zhang
    Nature Reviews Nephrology, 2020, 16 : 408 - 421
  • [3] Diffusion maps for high-dimensional single-cell analysis of differentiation data
    Haghverdi, Laleh
    Buettner, Florian
    Theis, Fabian J.
    BIOINFORMATICS, 2015, 31 (18) : 2989 - 2998
  • [5] Visualizing High-dimensional single-cell RNA-sequencing data through multiple Random Projections
    Tasoulis, Sotiris K.
    Vrahatis, Aristidis G.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5448 - 5450
  • [6] SCALABLE VISUALIZATION FOR HIGH-DIMENSIONAL SINGLE-CELL DATA
    Kim, Juho
    Russell, Nate
    Peng, Jian
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017, 2017, : 623 - 634
  • [7] Single-cell regulatory network inference and clustering from high-dimensional sequencing data
    Vrahatis, Aristidis G.
    Dimitrakopoulos, Georgios N.
    Tasoulis, Sotiris K.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2782 - 2789
  • [8] Topological Methods for Visualization and Analysis of High Dimensional Single-Cell RNA Sequencing Data
    Wang, Tongxin
    Johnson, Travis
    Zhang, Jie
    Huang, Kun
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019, 2019, : 350 - 361
  • [9] High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis
    Nie, Jinyu
    Qin, Zhilong
    Liu, Wei
    STATISTICS IN MEDICINE, 2024, 43 (25) : 4836 - 4849
  • [10] Mugen-UMAP: UMAP visualization and clustering of mutated genes in single-cell DNA sequencing data
    Li, Teng
    Zou, Yiran
    Li, Xianghan
    Wong, Thomas K. F.
    Rodrigo, Allen G.
    BMC BIOINFORMATICS, 2024, 25 (01):