Exploring dimension learning via a penalized probabilistic principal component analysis

被引:3
|
作者
Deng, Wei Q. [1 ,2 ]
Craiu, Radu, V [3 ]
机构
[1] McMaster Univ, Dept Psychiat & Behav Neurosci, Hamilton, ON, Canada
[2] St Josephs Healthcare Hamilton, Peter Boris Ctr Addict Res, Hamilton, ON, Canada
[3] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Dimension estimation; model selection; penalization; principal component analysis; probabilistic principal component analysis; profile likelihood; SELECTION; COVARIANCE; NUMBER; EIGENVALUES; SHRINKAGE; TESTS;
D O I
10.1080/00949655.2022.2100890
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Establishing a low-dimensional representation of the data leads to efficient data learning strategies. In many cases, the reduced dimension needs to be explicitly stated and estimated from the data. We explore the estimation of dimension in finite samples as a constrained optimization problem, where the estimated dimension is a maximizer of a penalized profile likelihood criterion within the framework of a probabilistic principal components analysis. Unlike other penalized maximization problems that require an 'optimal' penalty tuning parameter, we propose a data-averaging procedure whereby the estimated dimension emerges as the most favourable choice over a range of plausible penalty parameters. The proposed heuristic is compared to a large number of alternative criteria in simulations and an application to gene expression data. Extensive simulation studies reveal that none of the methods uniformly dominate the other and highlight the importance of subject-specific knowledge in choosing statistical methods for dimension learning. Our application results also suggest that gene expression data have a higher intrinsic dimension than previously thought. Overall, our proposed heuristic strikes a good balance and is the method of choice when model assumptions deviated moderately.
引用
收藏
页码:266 / 297
页数:32
相关论文
共 50 条
  • [31] Dimension selection for feature selection and dimension reduction with principal and independent component analysis
    Koch, Inge
    Naito, Kanta
    NEURAL COMPUTATION, 2007, 19 (02) : 513 - 545
  • [32] Dimension Reduction of Machine Learning-Based Forecasting Models Employing Principal Component Analysis
    Meng, Yinghui
    Qasem, Sultan Noman
    Shokri, Manouchehr
    Shahab, S.
    MATHEMATICS, 2020, 8 (08)
  • [33] Dimension Reduction in Time Series via Partially Quantified Principal Component
    Park, J. A.
    Hwang, S. Y.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2010, 23 (05) : 813 - 822
  • [34] Web document ranking via active learning and kernel principal component analysis
    Cai, Fei
    Chen, Honghui
    Shu, Zhen
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2015, 26 (04):
  • [35] Functional principal components analysis via penalized rank one approximation
    Huang, Jianhua Z.
    Shen, Haipeng
    Buja, Andreas
    ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 : 678 - 695
  • [36] Penalized principal logistic regression for sparse sufficient dimension reduction
    Shin, Seung Jun
    Artemiou, Andreas
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 111 : 48 - 58
  • [37] Two Dimension Locally Principal Component Analysis for Face Recognition
    Lin, Yu-sheng
    Wang, Jian-guo
    Yang, Jing-yu
    PROCEEDINGS OF THE 2008 CHINESE CONFERENCE ON PATTERN RECOGNITION (CCPR 2008), 2008, : 232 - 234
  • [38] Reduction of the multivariate input dimension using principal component analysis
    Xi, Jianhui
    Han, Min
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1047 - 1051
  • [39] A Geometric Algorithm for Contrastive Principal Component Analysis in High Dimension
    Lu, Rung-Sheng
    Wang, Shao-Hsuan
    Huang, Su-Yun
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024, 33 (03) : 909 - 916
  • [40] Speaker Recognition using Supervised Probabilistic Principal Component Analysis
    Lei, Yun
    Hansen, John H. L.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 382 - 385