Grouped variable screening for ultra-high dimensional data for linear model

被引:17
|
作者
Qiu, Debin [1 ]
Ahn, Jeongyoun [1 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
关键词
Grouped variable screening; HOLP; Multicollinearity; SIS; Sparse regression; Sure screening property; REGRESSION; SELECTION; LASSO;
D O I
10.1016/j.csda.2019.106894
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ultra-high dimensional data sets often need a screening step that removes irrelevant variables prior to the main analysis. In high-dimensional linear regression, screening relevant predictors before the model estimation often yields a better prediction accuracy and much faster computation. However, most existing screening approaches target on individual predictors, thus are not able to incorporate structured predictors, such as dummy variables and grouped variables. New screening methods for naturally grouped predictors for high dimensional linear regression are presented. Two popular variable screening methods are generalized to the grouped predictors case, and also a novel screening procedure is proposed. Asymptotic sure screening properties for all three methods are established. Also empirical benefits of the screening approaches via simulation and a real data analysis are demonstrated. Specifically, a two-step analysis that does a screening followed by a sparse estimation improves the prediction accuracy as well as computing time, compared to one-stage sparse regression. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Grouped feature screening for ultra-high dimensional data for the classification model
    He, Hanji
    Deng, Guangming
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (05) : 974 - 997
  • [2] A robust variable screening procedure for ultra-high dimensional data
    Ghosh, Abhik
    Thoresen, Magne
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (08) : 1816 - 1832
  • [3] Forward Regression for Ultra-High Dimensional Variable Screening
    Wang, Hansheng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1512 - 1524
  • [4] Conditional Variable Screening for Ultra-High Dimensional Longitudinal Data With Time Interactions
    Bratsberg, Andrea
    Ghosh, Abhik
    Thoresen, Magne
    BIOMETRICAL JOURNAL, 2024, 66 (08)
  • [5] Category-Adaptive Variable Screening for Ultra-High Dimensional Heterogeneous Categorical Data
    Xie, Jinhan
    Lin, Yuanyuan
    Yan, Xiaodong
    Tang, Niansheng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2020, 115 (530) : 747 - 760
  • [6] PRIOR KNOWLEDGE GUIDED ULTRA-HIGH DIMENSIONAL VARIABLE SCREENING WITH APPLICATION TO NEUROIMAGING DATA
    He, Jie
    Kang, Jian
    STATISTICA SINICA, 2022, 32 : 2095 - 2117
  • [7] Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings
    Li, Dongjin
    Dutta, Somak
    Roy, Vivekananda
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (01) : 61 - 73
  • [8] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Junying Zhang
    Hang Wang
    Riquan Zhang
    Jiajia Zhang
    Journal of Systems Science and Complexity, 2020, 33 : 510 - 526
  • [9] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    ZHANG Junying
    WANG Hang
    ZHANG Riquan
    ZHANG Jiajia
    Journal of Systems Science & Complexity, 2020, 33 (02) : 510 - 526
  • [10] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Zhang, Junying
    Wang, Hang
    Zhang, Riquan
    Zhang, Jiajia
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (02) : 510 - 526