High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis

被引:0
|
作者
Nie, Jinyu [1 ,2 ]
Qin, Zhilong [3 ]
Liu, Wei [4 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[3] Southwestern Univ Finance & Econ, Inst Western China Econ Res, Chengdu, Peoples R China
[4] Sichuan Univ, Sch Math, Chengdu, Peoples R China
关键词
generalized factor model; high dimension; mixed-type data; overdispersion; variational EM; MAXIMUM-LIKELIHOOD; INFERENCE; NUMBER;
D O I
10.1002/sim.10213
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
引用
收藏
页码:4836 / 4849
页数:14
相关论文
共 50 条
  • [1] Tools for the analysis of high-dimensional single-cell RNA sequencing data
    Wu, Yan
    Zhang, Kun
    NATURE REVIEWS NEPHROLOGY, 2020, 16 (07) : 408 - 421
  • [2] Tools for the analysis of high-dimensional single-cell RNA sequencing data
    Yan Wu
    Kun Zhang
    Nature Reviews Nephrology, 2020, 16 : 408 - 421
  • [4] Meeting the Challenges of High-Dimensional Single-Cell Data Analysis in Immunology
    Patil, Subarea
    Heuser, Christoph
    de Almeida, Gustavo P.
    Theis, Fabian J.
    Zielinski, Christina E.
    FRONTIERS IN IMMUNOLOGY, 2019, 10
  • [5] Diffusion maps for high-dimensional single-cell analysis of differentiation data
    Haghverdi, Laleh
    Buettner, Florian
    Theis, Fabian J.
    BIOINFORMATICS, 2015, 31 (18) : 2989 - 2998
  • [6] Single-cell regulatory network inference and clustering from high-dimensional sequencing data
    Vrahatis, Aristidis G.
    Dimitrakopoulos, Georgios N.
    Tasoulis, Sotiris K.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2782 - 2789
  • [7] SCALABLE VISUALIZATION FOR HIGH-DIMENSIONAL SINGLE-CELL DATA
    Kim, Juho
    Russell, Nate
    Peng, Jian
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017, 2017, : 623 - 634
  • [8] High-dimensional covariate-augmented overdispersed poisson factor model
    Liu, Wei
    Zhong, Qingzhi
    BIOMETRICS, 2024, 80 (02)
  • [9] Visualizing High-dimensional single-cell RNA-sequencing data through multiple Random Projections
    Tasoulis, Sotiris K.
    Vrahatis, Aristidis G.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5448 - 5450
  • [10] Integration, exploration, and analysis of high-dimensional single-cell cytometry data using Spectre
    Ashhurst, Thomas Myles
    Marsh-Wakefield, Felix
    Putri, Givanna Haryono
    Spiteri, Alanna Gabrielle
    Shinko, Diana
    Read, Mark Norman
    Smith, Adrian Lloyd
    King, Nicholas Jonathan Cole
    CYTOMETRY PART A, 2022, 101 (03) : 237 - 253