Sparse PCA for High-Dimensional Data With Outliers

被引:47
|
作者
Hubert, Mia [1 ]
Reynkens, Tom [1 ]
Schmitt, Eric [1 ]
Verdonck, Tim [1 ]
机构
[1] Katholieke Univ Leuven, Dept Math, Leuven, Belgium
关键词
Dimension reduction; Outlier detection; Robustness; PROJECTION-PURSUIT APPROACH; PRINCIPAL COMPONENTS; ROBUST PCA;
D O I
10.1080/00401706.2015.1093962
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time.
引用
收藏
页码:424 / 434
页数:11
相关论文
共 50 条
  • [41] ON THE PERFORMANCE OF KERNEL ESTIMATORS FOR HIGH-DIMENSIONAL, SPARSE BINARY DATA
    GRUND, B
    HALL, P
    JOURNAL OF MULTIVARIATE ANALYSIS, 1993, 44 (02) : 321 - 344
  • [42] Fused Feature Representation Discovery for High-Dimensional and Sparse Data
    Suzuki, Jun
    Nagata, Masaaki
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1593 - 1599
  • [43] Sparse boosting for high-dimensional survival data with varying coefficients
    Yue, Mu
    Li, Jialiang
    Ma, Shuangge
    STATISTICS IN MEDICINE, 2018, 37 (05) : 789 - 800
  • [44] Market segmentation using high-dimensional sparse consumers data
    Zhou, Jian
    Zhai, Linli
    Pantelous, Athanasios A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 145 (145)
  • [45] A sparse factor model for clustering high-dimensional longitudinal data
    Lu, Zihang
    Chandra, Noirrit Kiran
    STATISTICS IN MEDICINE, 2024, 43 (19) : 3633 - 3648
  • [46] Sparse representation approaches for the classification of high-dimensional biological data
    Li, Yifeng
    Ngom, Alioune
    BMC SYSTEMS BIOLOGY, 2013, 7
  • [47] CLASSIFICATION OF HIGH-DIMENSIONAL DATA USING THE SPARSE MATRIX TRANSFORM
    Bachega, Leonardo R. |
    Bouman, Charles A.
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 265 - 268
  • [48] SS/OSF for high-dimensional sparse data object clustering
    Wu, Ping
    Song, Han-Tao
    Niu, Zhen-Dong
    Zhang, Li-Ping
    Zhang, Ju-Li
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2006, 26 (03): : 216 - 220
  • [49] On the challenges of learning with inference networks on sparse, high-dimensional data
    Krishnan, Rahul G.
    Liang, Dawen
    Hoffman, Matthew D.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [50] A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data
    Ding, Qi
    Kolaczyk, Eric D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) : 7419 - 7433