The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time

被引:0
|
作者
Agrawal, Raj [1 ]
Broderick, Tamara [1 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
关键词
functional ANOVA; interaction discovery; kernel ridge regression; nonlinear; variable selection; sparse high-dimensional regression; REGRESSION; PRIORS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many scientific problems require identifying a small set of covariates that are associated with a target response and estimating their effects. Often, these effects are nonlinear and include interactions, so linear and additive methods can lead to poor estimation and variable selection. Unfortunately, methods that simultaneously express sparsity, nonlinearity, and interactions are computationally intractable - with runtime at least quadratic in the number of covariates, and often worse. In the present work, we solve this computational bottleneck. We show that suitable interaction models have a kernel representation, namely there exists a "kernel trick" to perform variable selection and estimation in O(# covariates) time. Our resulting fit corresponds to a sparse orthogonal decomposition of the regression function in a Hilbert space (i.e., a functional ANOVA decomposition), where interaction effects represent all variation that cannot be explained by lower-order effects. On a variety of synthetic and real data sets, our approach outperforms existing methods used for large, high-dimensional data sets while remaining competitive (or being orders of magnitude faster) in runtime.
引用
收藏
页码:1 / 60
页数:60
相关论文
共 50 条
  • [41] A stepwise regression algorithm for high-dimensional variable selection
    Hwang, Jing-Shiang
    Hu, Tsuey-Hwa
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
  • [42] VARIABLE SELECTION AND PREDICTION WITH INCOMPLETE HIGH-DIMENSIONAL DATA
    Liu, Ying
    Wang, Yuanjia
    Feng, Yang
    Wall, Melanie M.
    ANNALS OF APPLIED STATISTICS, 2016, 10 (01): : 418 - 450
  • [43] Bayesian variable selection for high-dimensional rank data
    Cui, Can
    Singh, Susheela P.
    Staicu, Ana-Maria
    Reich, Brian J.
    ENVIRONMETRICS, 2021, 32 (07)
  • [44] ON THE COMPUTATIONAL COMPLEXITY OF HIGH-DIMENSIONAL BAYESIAN VARIABLE SELECTION
    Yang, Yun
    Wainwright, Martin J.
    Jordan, Michael I.
    ANNALS OF STATISTICS, 2016, 44 (06): : 2497 - 2532
  • [45] GREEDY VARIABLE SELECTION FOR HIGH-DIMENSIONAL COX MODELS
    Lin, Chien-Tong
    Cheng, Yu-Jen
    Ing, Ching-Kang
    STATISTICA SINICA, 2023, 33 : 1697 - 1719
  • [46] Variable Selection Diagnostics Measures for High-Dimensional Regression
    Nan, Ying
    Yang, Yuhong
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
  • [47] A Variable Selection Method for High-Dimensional Survival Data
    Giordano, Francesco
    Milito, Sara
    Restaino, Marialuisa
    MATHEMATICAL AND STATISTICAL METHODS FOR ACTUARIAL SCIENCES AND FINANCE, MAF 2022, 2022, : 303 - 308
  • [48] Estimating the effect of a variable in a high-dimensional linear model
    Jensen, Peter S.
    Wurtz, Allan H.
    ECONOMETRICS JOURNAL, 2012, 15 (02): : 325 - 357
  • [49] Partial profile score feature selection in high-dimensional generalized linear interaction models
    Xu, Zengchao
    Luo, Shan
    Chen, Zehua
    STATISTICS AND ITS INTERFACE, 2022, 15 (04) : 433 - 447
  • [50] Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings
    Gilhodes, Julia
    Dalenc, Florence
    Gal, Jocelyn
    Zemmour, Christophe
    Leconte, Eve
    Boher, Jean-Marie
    Filleron, Thomas
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2020, 2020