Principal Variables Analysis for Non-Gaussian Data

被引:0
|
作者
Clark-Boucher, Dylan [1 ]
Miller, Jeffrey W. [1 ]
机构
[1] Harvard Univ, Dept Biostat, 655 Huntington Ave, Boston, MA 02115 USA
关键词
Non-normality; Ordinal data; Variable selection; X-linked dystonia parkinsonism; COMPONENT ANALYSIS; DISCARDING VARIABLES; ALGORITHMS; SELECTION;
D O I
10.1080/10618600.2024.2367098
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal variables analysis (PVA) is a technique for selecting a subset of variables that capture as much of the information in a dataset as possible. Existing approaches for PVA are based on the Pearson correlation matrix, which is not well-suited to describing the relationships between non-Gaussian variables. We propose a generalized approach to PVA enabling the use of different types of correlation, and we explore using Spearman, Gaussian copula, and polychoric correlations as alternatives to Pearson correlation. We compare performance in simulation studies varying the form of the true multivariate distribution over a range of possibilities. Our results show that on continuous non-Gaussian data, using generalized PVA with Gaussian copula or Spearman correlations provides a major improvement in performance compared to Pearson. On ordinal data, generalized PVA with polychoric correlations outperforms the rest by a wide margin. We apply generalized PVA to a dataset of 102 clinical variables measured on individuals with X-linked dystonia parkinsonism (XDP), a neurodegenerative disorder involving symptoms of both dystonia and parkinsonism. We find that using different types of correlation yields substantively different sets of principal variables; for example, parkinsonism-related metrics appear more explanatory than dystonia-related metrics on the observed data. Supplementary materials for this article are available online.
引用
收藏
页码:374 / 383
页数:10
相关论文
共 50 条
  • [41] On the Comparisons of Decorrelation Approaches for Non-Gaussian Neutral Vector Variables
    Ma, Zhanyu
    Lu, Xiaoou
    Xie, Jiyang
    Yang, Zhen
    Xue, Jing-Hao
    Tan, Zheng-Hua
    Xiao, Bo
    Guo, Jun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 1823 - 1837
  • [42] singR: An R Package for Simultaneous Non-Gaussian Component Analysis for Data
    Wang, Liangkang
    Gaynanova, Irina
    Risk, Benjamin
    R JOURNAL, 2023, 15 (04): : 69 - 83
  • [43] Processing Non-Gaussian Data Residuals in Geomagnetism
    Khokhlov, Andrey
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [44] Classification With a Non-Gaussian Model for PolSAR Data
    Doulgeris, Anthony P.
    Anfinsen, Stian Normann
    Eltoft, Torbjorn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2008, 46 (10): : 2999 - 3009
  • [45] Ensemble Learning in Non-Gaussian Data Assimilation
    Seybold, Hansjoerg
    Ravela, Sai
    Tagade, Piyush
    DYNAMIC DATA-DRIVEN ENVIRONMENTAL SYSTEMS SCIENCE, DYDESS 2014, 2015, 8964 : 227 - 238
  • [46] Non-Gaussian likelihood function and COBE data
    Amendola, L
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 1996, 283 (03) : 983 - 989
  • [47] Non-Gaussian infinite dimensional analysis
    Albeverio, S
    Daletsky, YL
    Kondratiev, YG
    Streit, L
    JOURNAL OF FUNCTIONAL ANALYSIS, 1996, 138 (02) : 311 - 350
  • [48] Sparse Non-Gaussian Component Analysis
    Diederichs, Elmar
    Juditsky, Anatoli
    Spokoiny, Vladimir
    Schuette, Christof
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2010, 56 (06) : 3033 - 3047
  • [49] Nonparanormal Structural VAR for Non-Gaussian Data
    Aramayis Dallakyan
    Computational Economics, 2021, 57 : 1093 - 1113
  • [50] Non-Gaussian data expansion in the Earth Sciences
    Journel, A. G.
    Alabert, F.
    TERRA NOVA, 1989, 1 (02) : 123 - 134