High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis

被引:0
|
作者
Nie, Jinyu [1 ,2 ]
Qin, Zhilong [3 ]
Liu, Wei [4 ]
机构
[1] Southwestern Univ Finance & Econ, Ctr Stat Res, Chengdu, Peoples R China
[2] Southwestern Univ Finance & Econ, Sch Stat, Chengdu, Peoples R China
[3] Southwestern Univ Finance & Econ, Inst Western China Econ Res, Chengdu, Peoples R China
[4] Sichuan Univ, Sch Math, Chengdu, Peoples R China
关键词
generalized factor model; high dimension; mixed-type data; overdispersion; variational EM; MAXIMUM-LIKELIHOOD; INFERENCE; NUMBER;
D O I
10.1002/sim.10213
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.
引用
收藏
页码:4836 / 4849
页数:14
相关论文
共 50 条
  • [41] On generalized latent factor modeling and inference for high-dimensional binomial data
    Ma, Ting Fung
    Wang, Fangfang
    Zhu, Jun
    BIOMETRICS, 2023, 79 (03) : 2311 - 2320
  • [42] Toward High-Dimensional Single-Cell Analysis of Graphene Oxide Biological Impact: Tracking on Immune Cells by Single-Cell Mass Cytometry
    Orecchioni, Marco
    Bordoni, Valentina
    Fuoco, Claudia
    Reina, Giacomo
    Lin, Hazel
    Zoccheddu, Martina
    Yilmazer, Acelya
    Zavan, Barbara
    Cesareni, Gianni
    Bedognetti, Davide
    Bianco, Alberto
    Delogu, Lucia Gemma
    SMALL, 2020, 16 (21)
  • [43] Single cell proteomics in biomedicine: High-dimensional data acquisition, visualization, and analysis
    Su, Yapeng
    Shi, Qihui
    Wei, Wei
    PROTEOMICS, 2017, 17 (3-4)
  • [44] Protein Barcodes Enable High-Dimensional Single-Cell CRISPR Screens
    Wroblewska, Aleksandra
    Dhainaut, Maxime
    Ben-Zvi, Benjamin
    Rose, Samuel A.
    Park, Eun Sook
    Amir, El-Ad David
    Bektesevic, Anela
    Baccarini, Alessia
    Merad, Miriam
    Rahman, Adeeb H.
    Brown, Brian D.
    CELL, 2018, 175 (04) : 1141 - +
  • [45] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3234 - 3244
  • [46] scDA: Single cell discriminant analysis for single-cell RNA sequencing data
    Shi, Qianqian
    Li, Xinxing
    Peng, Qirui
    Zhang, Chuanchao
    Chen, Luonan
    Computational and Structural Biotechnology Journal, 2021, 19 : 3234 - 3244
  • [47] High-dimensional and single-cell transcriptome analysis of the tumor microenvironment in angioimmunoblastic T cell lymphoma (AITL)
    Pritchett, Joshua C.
    Yang, Zhi-Zhang
    Kim, Hyo Jin
    Villasboas, Jose C.
    Tang, Xinyi
    Jalali, Shahrzad
    Cerhan, James R.
    Feldman, Andrew L.
    Ansell, Stephen M.
    LEUKEMIA, 2022, 36 (01) : 165 - 176
  • [48] Generalized Linear Discriminant Analysis for High-Dimensional Genomic Data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 704 - 704
  • [49] Generalized linear discriminant analysis for high-dimensional genomic data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 713 - 713
  • [50] High-dimensional and single-cell transcriptome analysis of the tumor microenvironment in angioimmunoblastic T cell lymphoma (AITL)
    Joshua C. Pritchett
    Zhi-Zhang Yang
    Hyo Jin Kim
    Jose C. Villasboas
    Xinyi Tang
    Shahrzad Jalali
    James R. Cerhan
    Andrew L. Feldman
    Stephen M. Ansell
    Leukemia, 2022, 36 : 165 - 176