Principal Variables Analysis for Non-Gaussian Data

被引:0
|
作者
Clark-Boucher, Dylan [1 ]
Miller, Jeffrey W. [1 ]
机构
[1] Harvard Univ, Dept Biostat, 655 Huntington Ave, Boston, MA 02115 USA
关键词
Non-normality; Ordinal data; Variable selection; X-linked dystonia parkinsonism; COMPONENT ANALYSIS; DISCARDING VARIABLES; ALGORITHMS; SELECTION;
D O I
10.1080/10618600.2024.2367098
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal variables analysis (PVA) is a technique for selecting a subset of variables that capture as much of the information in a dataset as possible. Existing approaches for PVA are based on the Pearson correlation matrix, which is not well-suited to describing the relationships between non-Gaussian variables. We propose a generalized approach to PVA enabling the use of different types of correlation, and we explore using Spearman, Gaussian copula, and polychoric correlations as alternatives to Pearson correlation. We compare performance in simulation studies varying the form of the true multivariate distribution over a range of possibilities. Our results show that on continuous non-Gaussian data, using generalized PVA with Gaussian copula or Spearman correlations provides a major improvement in performance compared to Pearson. On ordinal data, generalized PVA with polychoric correlations outperforms the rest by a wide margin. We apply generalized PVA to a dataset of 102 clinical variables measured on individuals with X-linked dystonia parkinsonism (XDP), a neurodegenerative disorder involving symptoms of both dystonia and parkinsonism. We find that using different types of correlation yields substantively different sets of principal variables; for example, parkinsonism-related metrics appear more explanatory than dystonia-related metrics on the observed data. Supplementary materials for this article are available online.
引用
收藏
页码:374 / 383
页数:10
相关论文
共 50 条
  • [21] Gaussian and non-Gaussian Double Subspace Statistical Process Monitoring Based on Principal Component Analysis and Independent Component Analysis
    Huang, Jian
    Yan, Xuefeng
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2015, 54 (03) : 1015 - 1027
  • [22] Kepler Data Analysis: Non-Gaussian Noise and Fourier Gaussian Process Analysis of Stellar Variability
    Robnik, Jakob
    Seljak, Uros
    ASTRONOMICAL JOURNAL, 2020, 159 (05):
  • [23] Weighted Principal Component Analysis for Wiener System Identification Regularization and non-Gaussian Excitations
    Zhang, Qinghua
    Laurain, Vincent
    Wang, Jiandong
    IFAC PAPERSONLINE, 2015, 48 (28): : 602 - 607
  • [24] Weighted preliminary-summation-based principal component analysis for non-Gaussian processes
    Li, Ning
    Guo, Shaojun
    Wang, Youqing
    CONTROL ENGINEERING PRACTICE, 2019, 87 : 122 - 132
  • [25] When non-Gaussian states are Gaussian: Generalization of nonseparability criterion for continuous variables
    McHugh, Derek
    Buzek, Vladimir
    Ziman, Mario
    PHYSICAL REVIEW A, 2006, 74 (05):
  • [26] Analysis of meridian estimator performance for non-gaussian pdf data samples
    Kurkin D.A.
    Roenko A.A.
    Lukin V.V.
    Djurovic I.
    Telecommunications and Radio Engineering (English translation of Elektrosvyaz and Radiotekhnika), 2010, 69 (08): : 669 - 680
  • [27] Data analysis strategies for the detection of gravitational waves in non-Gaussian noise
    Creighton, JDE
    PHYSICAL REVIEW D, 1999, 60 (02):
  • [28] Statistical analysis of hyper-spectral data: A non-Gaussian approach
    Acito, N.
    Corsini, G.
    Diani, M.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
  • [29] Statistical Analysis of Hyper-Spectral Data: A Non-Gaussian Approach
    N. Acito
    G. Corsini
    M. Diani
    EURASIP Journal on Advances in Signal Processing, 2007
  • [30] Investigations on non-Gaussian factor analysis
    Liu, ZY
    Chiu, KC
    Xu, L
    IEEE SIGNAL PROCESSING LETTERS, 2004, 11 (07) : 597 - 600