Bayesian feature selection in high-dimensional regression in presence of correlated noise

被引:1
|
作者
Feldman, Guy [1 ]
Bhadra, Anindya [1 ]
Kirshner, Sergey [1 ]
机构
[1] Purdue Univ, Dept Stat, 250 N Univ St, W Lafayette, IN 47907 USA
来源
STAT | 2014年 / 3卷 / 01期
关键词
Bayesian methods; genomics; graphical models; high-dimensional data; variable selection;
D O I
10.1002/sta4.60
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of feature selection in a high-dimensional multiple predictors, multiple responses regression setting. Assuming that regression errors are i.i.d. when they are in fact dependent leads to inconsistent and inefficient feature estimates. We relax the i.i.d. assumption by allowing the errors to exhibit a tree-structured dependence. This allows a Bayesian problem formulation with the error dependence structure treated as an auxiliary variable that can be integrated out analytically with the help of the matrix-tree theorem. Mixing over trees results in a flexible technique for modelling the graphical structure for the regression errors. Furthermore, the analytic integration results in a collapsed Gibbs sampler for feature selection that is computationally efficient. Our approach offers significant performance gains over the competing methods in simulations, especially when the features themselves are correlated. In addition to comprehensive simulation studies, we apply our method to a high-dimensional breast cancer data set to identify markers significantly associated with the disease. Copyright (C) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:258 / 272
页数:15
相关论文
共 50 条
  • [31] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [32] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [33] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [34] A semi-parametric approach to feature selection in high-dimensional linear regression models
    Liu, Yuyang
    Pi, Pengfei
    Luo, Shan
    COMPUTATIONAL STATISTICS, 2023, 38 (02) : 979 - 1000
  • [35] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    BMC BIOINFORMATICS, 2018, 19
  • [36] Feature selection for high-dimensional temporal data
    Michail Tsagris
    Vincenzo Lagani
    Ioannis Tsamardinos
    BMC Bioinformatics, 19
  • [37] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [38] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [39] A semi-parametric approach to feature selection in high-dimensional linear regression models
    Yuyang Liu
    Pengfei Pi
    Shan Luo
    Computational Statistics, 2023, 38 : 979 - 1000
  • [40] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,