Grouped variable screening for ultra-high dimensional data for linear model

被引:17
|
作者
Qiu, Debin [1 ]
Ahn, Jeongyoun [1 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
关键词
Grouped variable screening; HOLP; Multicollinearity; SIS; Sparse regression; Sure screening property; REGRESSION; SELECTION; LASSO;
D O I
10.1016/j.csda.2019.106894
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ultra-high dimensional data sets often need a screening step that removes irrelevant variables prior to the main analysis. In high-dimensional linear regression, screening relevant predictors before the model estimation often yields a better prediction accuracy and much faster computation. However, most existing screening approaches target on individual predictors, thus are not able to incorporate structured predictors, such as dummy variables and grouped variables. New screening methods for naturally grouped predictors for high dimensional linear regression are presented. Two popular variable screening methods are generalized to the grouped predictors case, and also a novel screening procedure is proposed. Asymptotic sure screening properties for all three methods are established. Also empirical benefits of the screening approaches via simulation and a real data analysis are demonstrated. Specifically, a two-step analysis that does a screening followed by a sparse estimation improves the prediction accuracy as well as computing time, compared to one-stage sparse regression. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Group screening for ultra-high-dimensional feature under linear model
    Niu, Yong
    Zhang, Riquan
    Liu, Jicai
    Li, Huapeng
    STATISTICAL THEORY AND RELATED FIELDS, 2020, 4 (01) : 43 - 54
  • [32] Uniform joint screening for ultra-high dimensional graphical models
    Zheng, Zemin
    Shi, Haiyu
    Li, Yang
    Yuan, Hui
    JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 179
  • [33] Conditional screening for ultra-high dimensional covariates with survival outcomes
    Hyokyoung G. Hong
    Jian Kang
    Yi Li
    Lifetime Data Analysis, 2018, 24 : 45 - 71
  • [34] Adjusted feature screening for ultra-high dimensional missing response
    Zou, Liying
    Liu, Yi
    Zhang, Zhonghu
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (03) : 460 - 483
  • [35] Conditional screening for ultra-high dimensional covariates with survival outcomes
    Hong, Hyokyoung G.
    Kang, Jian
    Li, Yi
    LIFETIME DATA ANALYSIS, 2018, 24 (01) : 45 - 71
  • [36] Robust feature screening for ultra-high dimensional right censored data via distance correlation
    Chen, Xiaolin
    Chen, Xiaojing
    Wang, Hong
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 119 : 118 - 138
  • [37] BOLT-SSI: A STATISTICAL APPROACH TO SCREENING INTERACTION EFFECTS FOR ULTRA-HIGH DIMENSIONAL DATA
    Zhou, Min
    Dai, Mingwei
    Yao, Yuan
    Liu, Jin
    Yang, Can
    Peng, Heng
    STATISTICA SINICA, 2023, 33 (04) : 2327 - 2358
  • [38] Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data
    Zhang, Shen
    Zhao, Peixin
    Li, Gaorong
    Xu, Wangli
    JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 171 : 37 - 52
  • [39] The fused Kolmogorov–Smirnov screening for ultra-high dimensional semi-competing risks data
    Liu, Yi
    Chen, Xiaolin
    Wang, Hong
    Applied Mathematical Modelling, 2021, 98 : 109 - 120
  • [40] Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models
    Li, Yujie
    Li, Gaorong
    Lian, Heng
    Tong, Tiejun
    JOURNAL OF MULTIVARIATE ANALYSIS, 2017, 155 : 133 - 150