On Robustness of Principal Component Regression

被引:23
|
作者
Agarwal, Anish [1 ]
Shah, Devavrat [1 ]
Shen, Dennis [1 ]
Song, Dogyoon [1 ]
机构
[1] MIT, EECS, 32 Vassar St, Cambridge, MA 02139 USA
关键词
Error-in-variables regression; Hard singular value thresholding; Matrix estimation; Principal component regression; Synthetic controls; PANEL;
D O I
10.1080/01621459.2021.1928513
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Principal component regression (PCR) is a simple, but powerful and ubiquitously utilized method. Its effectiveness is well established when the covariates exhibit low-rank structure. However, its ability to handle settings with noisy, missing, and mixed-valued, that is, discrete and continuous, covariates is not understood and remains an important open challenge. As the main contribution of this work, we establish the robustness of PCR, without any change, in this respect and provide meaningful finite-sample analysis. To do so, we establish that PCR is equivalent to performing linear regression after preprocessing the covariate matrix via hard singular value thresholding (HSVT). As a result, in the context of counterfactual analysis using observational data, we show PCR is equivalent to the recently proposed robust variant of the synthetic control method, known as robust synthetic control (RSC). As an immediate consequence, we obtain finite-sample analysis of the RSC estimator that was previously absent. As an important contribution to the synthetic controls literature, we establish that an (approximate) linear synthetic control exists in the setting of a generalized factor model, or latent variable model; traditionally in the literature, the existence of a synthetic control needs to be assumed to exist as an axiom. We further discuss a surprising implication of the robustness property of PCR with respect to noise, that is, PCR can learn a good predictive model even if the covariates are tactfully transformed to preserve differential privacy. Finally, this work advances the state-of-the-art analysis for HSVT by establishing stronger guarantees with respect to the l2,infinity -norm rather than the Frobenius norm as is commonly done in the matrix estimation literature, which may be of interest in its own right.
引用
收藏
页码:1731 / 1745
页数:15
相关论文
共 50 条
  • [41] Evaluation of principal component selection methods to form a global prediction model by principal component regression
    Xie, YL
    Kalivas, JH
    ANALYTICA CHIMICA ACTA, 1997, 348 (1-3) : 19 - 27
  • [42] Robustness of Principal Component Analysis with Spearman's Rank Matrix
    Watanabe, Kodai
    Naito, Kanta
    Koch, Inge
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2024, 18 (01)
  • [43] Robustness of Principal Component Analysis with Spearman’s Rank Matrix
    Kodai Watanabe
    Kanta Naito
    Inge Koch
    Journal of Statistical Theory and Practice, 2024, 18
  • [44] On Robustness of Kernel Principal Component Analysis using Fast HCS
    Muhamed, Lekaa Ali
    Mohammed, Hayder Yahya
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (04): : 292 - 303
  • [45] Comment: Ridge Regression, Ranking Variables and Improved Principal Component Regression
    Choi, Nam-Hee
    Shedden, Kerby
    Xu, Gongjun
    Zhang, Xuefei
    Zhu, Ji
    TECHNOMETRICS, 2020, 62 (04) : 451 - 455
  • [46] Handling multicollinearity in quantile regression through the use of principal component regression
    C. Davino
    R. Romano
    D. Vistocco
    METRON, 2022, 80 : 153 - 174
  • [47] Handling multicollinearity in quantile regression through the use of principal component regression
    Davino, C.
    Romano, R.
    Vistocco, D.
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2022, 80 (02): : 153 - 174
  • [48] Bayesian principal component regression with data-driven component selection
    Wang, Liuxia
    JOURNAL OF APPLIED STATISTICS, 2012, 39 (06) : 1177 - 1189
  • [49] Use of chemometrics: Principal Component Analysis (PCA) and principal component regression (PCR) for the authentication of orange juice
    Vaira, S
    Mantovani, VE
    Robles, JC
    Sanchis, JC
    Goicoechea, HC
    ANALYTICAL LETTERS, 1999, 32 (15) : 3131 - 3141
  • [50] Efficient cross-validation of principal components applied to principal component regression
    Mertens, BJA
    Fearn, T
    Thompson, M
    STATISTICS AND COMPUTING, 1996, 6 (02) : 178 - 178