High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis

被引:0
|
作者
Nie, Jinyu [1 ,2 ]
Qin, Zhilong [3 ]
Liu, Wei [4 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[3] Southwestern Univ Finance & Econ, Inst Western China Econ Res, Chengdu, Peoples R China
[4] Sichuan Univ, Sch Math, Chengdu, Peoples R China
关键词
generalized factor model; high dimension; mixed-type data; overdispersion; variational EM; MAXIMUM-LIKELIHOOD; INFERENCE; NUMBER;
D O I
10.1002/sim.10213
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
引用
收藏
页码:4836 / 4849
页数:14
相关论文
共 50 条
  • [21] Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data
    Cristian, Padron-Manrique
    Aaron, Vazquez-Jimenez
    Armando, Esquivel-Hernandez Diego
    Estrella, Martinez-Lopez Yoscelina
    Daniel, Neri-Rosario
    David, Giron-Villalobos
    Edgar, Mixcoha
    Paul, Sanchez-Castaneda Jean
    Osbaldo, Resendis-Antonio
    BIOLOGY-BASEL, 2024, 13 (07):
  • [22] High-dimensional generalized semiparametric model for longitudinal data
    Taavoni, M.
    Arashi, M.
    STATISTICS, 2021, 55 (04) : 831 - 850
  • [23] Data-driven comparison of multiple high-dimensional single-cell expression profiles
    Okada, Daigo
    Cheng, Jian Hao
    Zheng, Cheng
    Yamada, Ryo
    JOURNAL OF HUMAN GENETICS, 2022, 67 (04) : 215 - 221
  • [24] Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data
    Weber, Lukas M.
    Robinson, Mark D.
    CYTOMETRY PART A, 2016, 89A (12) : 1084 - 1096
  • [25] Data-driven comparison of multiple high-dimensional single-cell expression profiles
    Daigo Okada
    Jian Hao Cheng
    Cheng Zheng
    Ryo Yamada
    Journal of Human Genetics, 2022, 67 : 215 - 221
  • [26] Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data
    Ocone, Andrea
    Haghverdi, Laleh
    Mueller, Nikola S.
    Theis, Fabian J.
    BIOINFORMATICS, 2015, 31 (12) : 89 - 96
  • [27] Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review
    Brendel, Matthew
    Su, Chang
    Bai, Zilong
    Zhang, Hao
    Elemento, Olivier
    Wang, Fei
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2022, 20 (05) : 814 - 835
  • [28] Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review
    Matthew Brendel
    Chang Su
    Zilong Bai
    Hao Zhang
    Olivier Elemento
    Fei Wang
    Genomics,Proteomics & Bioinformatics, 2022, Proteomics & Bioinformatics2022 (05) : 814 - 835
  • [29] High-Dimensional Single-Cell Transcriptomics in Melanoma and Cancer Immunotherapy
    Quek, Camelia
    Bai, Xinyu
    Long, Georgina, V
    Scolyer, Richard A.
    Wilmott, James S.
    GENES, 2021, 12 (10)
  • [30] Dissecting CLL through high-dimensional single-cell technologies
    Gohil, Satyen H.
    Wu, Catherine J.
    BLOOD, 2019, 133 (13) : 1446 - 1456