Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data

被引:3
|
作者
Cristian, Padron-Manrique [1 ,2 ]
Aaron, Vazquez-Jimenez [1 ]
Armando, Esquivel-Hernandez Diego [1 ]
Estrella, Martinez-Lopez Yoscelina [1 ,3 ]
Daniel, Neri-Rosario [1 ,4 ]
David, Giron-Villalobos [1 ,4 ]
Edgar, Mixcoha [1 ,5 ]
Paul, Sanchez-Castaneda Jean [1 ,4 ]
Osbaldo, Resendis-Antonio [1 ,6 ,7 ]
机构
[1] Inst Nacl Med Genomica INMEGEN, Human Syst Biol Lab, Arenal Tepepan, Perifer 4809, Mexico City 14610, Mexico
[2] Univ Nacl Autonoma Mexico, Programa Doctorado Ciencias Biomed, Coyoacan Unidad Posgrad, Edificio primer Piso B,Ciudad Univ, Mexico City 04510, Mexico
[3] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Doctorado Ciencias Med, Ciudad Univ,Edificio A,1er Piso, Mexico City 04510, Mexico
[4] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Maestria Ciencias Bioquim, Ciudad Univ,Edificio B,1er Piso, Mexico City 04510, Mexico
[5] CONAHCYT INMEGEN, Periferico Sur 4809, Mexico City 14610, Mexico
[6] Inst Nacl Ciencias Med & Nutr Salvador Zubiran, Coordinac Invest Cient Red Apoyo Invest, Belisario Dominguez Seccion16, Mexico City 14080, Mexico
[7] Unvers Nacl Autonoma Mexico UNAM, Ctr Ciencias Complejidad, Circuito Ctr Cultural, Mexico City 04510, Mexico
来源
BIOLOGY-BASEL | 2024年 / 13卷 / 07期
关键词
manifold learning; UMAP; diffusion maps; scRNA-seq; imputation; denoising; high-dimensional data; ADHESION; HYPOXIA; DEATH;
D O I
10.3390/biology13070512
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary In scRNA-seq analysis, diffusion-based approaches help identify the connections between cells, allowing us to observe the progression of individual cells as they change phenotypes within a mathematical space known as a manifold. Recently, these approaches have been used as a reference for imputation, a technique that addresses missing data, a common challenge in scRNA-seq analysis. For example, MAGIC is a popular diffusion-based imputation method, and it has shown success in uncovering gene-gene interactions related to phenotypic transitions that would not be possible without imputation. However, previous evaluations have not adequately compared the impact of different parameter settings on MAGIC, particularly over-smoothing issues. To address this, we developed sc-PHENIX, which utilizes a similar diffusion approach as MAGIC but incorporates a PCA-UMAP initialization step, whereas MAGIC only uses PCA. We compared sc-PHENIX and MAGIC in terms of imputation accuracy, visualization, biological insights, and preservation of data structure. Our findings show that sc-PHENIX outperforms MAGIC across various common parameters such as "diffusion time" (t), the number of nearest neighbors (knn), and PCA dimensions. It effectively captures and preserves the global, local, and continuous data structures, leading to more reliable imputation and potentially uncovering new biological insights in diverse datasets.Abstract Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
引用
收藏
页数:43
相关论文
共 50 条
  • [21] Visualizing High-Dimensional Single-Cell RNA-seq Data via Random Projections and Geodesic Distances
    Vrahatis, Aristidis G.
    Tasoulis, Sotiris K.
    Dimitrakopoulos, Georgios N.
    Plagianakos, Vassilis P.
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY - CIBCB 2019, 2019, : 115 - 120
  • [22] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    Computational and Structural Biotechnology Journal, 2021, 19 : 3234 - 3244
  • [23] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3234 - 3244
  • [24] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [25] MISC: missing imputation for single-cell RNA sequencing data
    Yang, Mary Qu
    Weissman, Sherman M.
    Yang, William
    Zhang, Jialing
    Canaann, Allon
    Guan, Renchu
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [26] SNV identification from single-cell RNA sequencing data
    Schnepp, Patricia M.
    Chen, Mengjie
    Keller, Evan T.
    Zhou, Xiang
    HUMAN MOLECULAR GENETICS, 2019, 28 (21) : 3569 - 3583
  • [27] Normalizing single-cell RNA sequencing data: Challenges and opportunities
    Vallejos C.A.
    Risso D.
    Scialdone A.
    Dudoit S.
    Marioni J.C.
    Nature Methods, 2017, 14 (6) : 565 - 571
  • [28] Analysis of single-cell RNA sequencing data based on autoencoders
    Andrea Tangherloni
    Federico Ricciuti
    Daniela Besozzi
    Pietro Liò
    Ana Cvejic
    BMC Bioinformatics, 22
  • [29] SCRIP: an accurate simulator for single-cell RNA sequencing data
    Qin, Fei
    Luo, Xizhi
    Xiao, Feifei
    Cai, Guoshuai
    BIOINFORMATICS, 2022, 38 (05) : 1304 - 1311
  • [30] The shaky foundations of simulating single-cell RNA sequencing data
    Crowell, Helena L.
    Leonardo, Sarah X. Morillo X.
    Soneson, Charlotte
    Robinson, Mark D.
    GENOME BIOLOGY, 2023, 24 (01)