Bayesian feature selection in high-dimensional regression in presence of correlated noise

被引:1
|
作者
Feldman, Guy [1 ]
Bhadra, Anindya [1 ]
Kirshner, Sergey [1 ]
机构
[1] Purdue Univ, Dept Stat, 250 N Univ St, W Lafayette, IN 47907 USA
来源
STAT | 2014年 / 3卷 / 01期
关键词
Bayesian methods; genomics; graphical models; high-dimensional data; variable selection;
D O I
10.1002/sta4.60
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of feature selection in a high-dimensional multiple predictors, multiple responses regression setting. Assuming that regression errors are i.i.d. when they are in fact dependent leads to inconsistent and inefficient feature estimates. We relax the i.i.d. assumption by allowing the errors to exhibit a tree-structured dependence. This allows a Bayesian problem formulation with the error dependence structure treated as an auxiliary variable that can be integrated out analytically with the help of the matrix-tree theorem. Mixing over trees results in a flexible technique for modelling the graphical structure for the regression errors. Furthermore, the analytic integration results in a collapsed Gibbs sampler for feature selection that is computationally efficient. Our approach offers significant performance gains over the competing methods in simulations, especially when the features themselves are correlated. In addition to comprehensive simulation studies, we apply our method to a high-dimensional breast cancer data set to identify markers significantly associated with the disease. Copyright (C) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:258 / 272
页数:15
相关论文
共 50 条
  • [21] Improving Generalisation of Genetic Programming for High-Dimensional Symbolic Regression with Feature Selection
    Chen, Qi
    Xue, Bing
    Niu, Ben
    Zhang, Mengjie
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 3793 - 3800
  • [22] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806
  • [23] Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2269 - 2282
  • [24] Benign overfitting of non-sparse high-dimensional linear regression with correlated noise
    Tsuda, Toshiki
    Imaizumi, Masaaki
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02): : 4119 - 4197
  • [25] Bayesian hierarchical models for high-dimensional mediation analysis with coordinated selection of correlated mediators
    Song, Yanyi
    Zhou, Xiang
    Kang, Jian
    Aung, Max T.
    Zhang, Min
    Zhao, Wei
    Needham, Belinda L.
    Kardia, Sharon L. R.
    Liu, Yongmei
    Meeker, John D.
    Smith, Jennifer A.
    Mukherjee, Bhramar
    STATISTICS IN MEDICINE, 2021, 40 (27) : 6038 - 6056
  • [26] Fuzzy Forests: Extending Random Forest Feature Selection for Correlated, High-Dimensional Data
    Conn, Daniel
    Ngun, Tuck
    Li, Gang
    Ramirez, Christina M.
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (09):
  • [27] Bayesian Model Selection in High-Dimensional Settings
    Johnson, Valen E.
    Rossell, David
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (498) : 649 - 660
  • [28] High-dimensional predictive regression in the presence of cointegration
    Koo, Bonsoo
    Anderson, Heather M.
    Seo, Myung Hwan
    Yao, Wenying
    JOURNAL OF ECONOMETRICS, 2020, 219 (02) : 456 - 477
  • [29] SPATIAL BAYESIAN VARIABLE SELECTION AND GROUPING FOR HIGH-DIMENSIONAL SCALAR-ON-IMAGE REGRESSION
    Li, Fan
    Zhang, Tingting
    Wang, Quanli
    Gonzalez, Marlen Z.
    Maresh, Erin L.
    Coan, James A.
    ANNALS OF APPLIED STATISTICS, 2015, 9 (02): : 687 - 713
  • [30] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25