Grouped variable screening for ultra-high dimensional data for linear model

被引:17
|
作者
Qiu, Debin [1 ]
Ahn, Jeongyoun [1 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
关键词
Grouped variable screening; HOLP; Multicollinearity; SIS; Sparse regression; Sure screening property; REGRESSION; SELECTION; LASSO;
D O I
10.1016/j.csda.2019.106894
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ultra-high dimensional data sets often need a screening step that removes irrelevant variables prior to the main analysis. In high-dimensional linear regression, screening relevant predictors before the model estimation often yields a better prediction accuracy and much faster computation. However, most existing screening approaches target on individual predictors, thus are not able to incorporate structured predictors, such as dummy variables and grouped variables. New screening methods for naturally grouped predictors for high dimensional linear regression are presented. Two popular variable screening methods are generalized to the grouped predictors case, and also a novel screening procedure is proposed. Asymptotic sure screening properties for all three methods are established. Also empirical benefits of the screening approaches via simulation and a real data analysis are demonstrated. Specifically, a two-step analysis that does a screening followed by a sparse estimation improves the prediction accuracy as well as computing time, compared to one-stage sparse regression. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Ultra-high dimensional variable selection for doubly robust causal inference
    Tang, Dingke
    Kong, Dehan
    Pan, Wenliang
    Wang, Linbo
    BIOMETRICS, 2023, 79 (02) : 903 - 914
  • [42] Forward variable selection for ultra-high dimensional quantile regression models
    Honda, Toshio
    Lin, Chien-Tong
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2023, 75 (03) : 393 - 424
  • [43] Forward variable selection for ultra-high dimensional quantile regression models
    Toshio Honda
    Chien-Tong Lin
    Annals of the Institute of Statistical Mathematics, 2023, 75 : 393 - 424
  • [44] Kernel based methods for accelerated failure time model with ultra-high dimensional data
    Zhenqiu Liu
    Dechang Chen
    Ming Tan
    Feng Jiang
    Ronald B Gartenhaus
    BMC Bioinformatics, 11
  • [45] Kernel based methods for accelerated failure time model with ultra-high dimensional data
    Liu, Zhenqiu
    Chen, Dechang
    Tan, Ming
    Jiang, Feng
    Gartenhaus, Ronald B.
    BMC BIOINFORMATICS, 2010, 11
  • [46] Projection quantile correlation and its use in high-dimensional grouped variable screening
    Liu, Jicai
    Si, Yuefeng
    Niu, Yong
    Zhang, Riquan
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 167
  • [47] A screening method for ultra-high dimensional features with overlapped partition structures
    He, Jie
    Song, Jiali
    Zhou, Xiao-Hua
    Hou, Yan
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (01) : 22 - 40
  • [48] The fused Kolmogorov-Smirnov screening for ultra-high dimensional semi-competing risks data
    Liu, Yi
    Chen, Xiaolin
    Wang, Hong
    APPLIED MATHEMATICAL MODELLING, 2021, 98 : 109 - 120
  • [49] GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
    Zheng, Qi
    Peng, Limin
    He, Xuming
    ANNALS OF STATISTICS, 2015, 43 (05): : 2225 - 2258
  • [50] Ultra-high sensitivity screening of the glycomes of mammals and model organisms
    Dell, A
    Haslam, S
    Hitchen, P
    Morris, HR
    Panico, M
    Smith, MS
    GLYCOBIOLOGY, 2002, 12 (10) : 648 - 649