The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time

被引:0
|
作者
Agrawal, Raj [1 ]
Broderick, Tamara [1 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
关键词
functional ANOVA; interaction discovery; kernel ridge regression; nonlinear; variable selection; sparse high-dimensional regression; REGRESSION; PRIORS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many scientific problems require identifying a small set of covariates that are associated with a target response and estimating their effects. Often, these effects are nonlinear and include interactions, so linear and additive methods can lead to poor estimation and variable selection. Unfortunately, methods that simultaneously express sparsity, nonlinearity, and interactions are computationally intractable - with runtime at least quadratic in the number of covariates, and often worse. In the present work, we solve this computational bottleneck. We show that suitable interaction models have a kernel representation, namely there exists a "kernel trick" to perform variable selection and estimation in O(# covariates) time. Our resulting fit corresponds to a sparse orthogonal decomposition of the regression function in a Hilbert space (i.e., a functional ANOVA decomposition), where interaction effects represent all variation that cannot be explained by lower-order effects. On a variety of synthetic and real data sets, our approach outperforms existing methods used for large, high-dimensional data sets while remaining competitive (or being orders of magnitude faster) in runtime.
引用
收藏
页码:1 / 60
页数:60
相关论文
共 50 条
  • [1] HIGH-DIMENSIONAL VARIABLE SELECTION
    Wasserman, Larry
    Roeder, Kathryn
    ANNALS OF STATISTICS, 2009, 37 (5A): : 2178 - 2201
  • [2] Variable selection in high-dimensional double generalized linear models
    Xu, Dengke
    Zhang, Zhongzhan
    Wu, Liucang
    STATISTICAL PAPERS, 2014, 55 (02) : 327 - 347
  • [3] Variable selection in high-dimensional partly linear additive models
    Lian, Heng
    JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (04) : 825 - 839
  • [4] Variable selection in high-dimensional double generalized linear models
    Dengke Xu
    Zhongzhan Zhang
    Liucang Wu
    Statistical Papers, 2014, 55 : 327 - 347
  • [5] Variable Selection and Identification of High-Dimensional Nonparametric Additive Nonlinear Systems
    Mu, Biqiang
    Zheng, Wei Xing
    Bai, Er-Wei
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (05) : 2254 - 2269
  • [6] Variable selection in high-dimensional linear model with possibly asymmetric errors
    Ciuperca, Gabriela
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 155
  • [7] Variable Selection in High-Dimensional Partially Linear Models with Longitudinal Data
    Yang Yiping
    Xue Liugen
    RECENT ADVANCE IN STATISTICS APPLICATION AND RELATED AREAS, VOLS I AND II, 2009, : 661 - 667
  • [8] Variable selection in high-dimensional sparse multiresponse linear regression models
    Luo, Shan
    STATISTICAL PAPERS, 2020, 61 (03) : 1245 - 1267
  • [9] Consistent Variable Selection for High-dimensional Nonparametric Additive Nonlinear Systems
    Mu, Biqiang
    Zheng, Wei Xing
    Bai, Er-Wei
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 3066 - 3071
  • [10] A consistent variable selection criterion for linear models with high-dimensional covariates
    Zheng, XD
    Loh, WY
    STATISTICA SINICA, 1997, 7 (02) : 311 - 325