Grouped variable screening for ultra-high dimensional data for linear model

被引:17
|
作者
Qiu, Debin [1 ]
Ahn, Jeongyoun [1 ]
机构
[1] Univ Georgia, Dept Stat, Athens, GA 30602 USA
关键词
Grouped variable screening; HOLP; Multicollinearity; SIS; Sparse regression; Sure screening property; REGRESSION; SELECTION; LASSO;
D O I
10.1016/j.csda.2019.106894
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Ultra-high dimensional data sets often need a screening step that removes irrelevant variables prior to the main analysis. In high-dimensional linear regression, screening relevant predictors before the model estimation often yields a better prediction accuracy and much faster computation. However, most existing screening approaches target on individual predictors, thus are not able to incorporate structured predictors, such as dummy variables and grouped variables. New screening methods for naturally grouped predictors for high dimensional linear regression are presented. Two popular variable screening methods are generalized to the grouped predictors case, and also a novel screening procedure is proposed. Asymptotic sure screening properties for all three methods are established. Also empirical benefits of the screening approaches via simulation and a real data analysis are demonstrated. Specifically, a two-step analysis that does a screening followed by a sparse estimation improves the prediction accuracy as well as computing time, compared to one-stage sparse regression. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A generic model-free feature screening procedure for ultra-high dimensional data with categorical response
    Cheng, Xuewei
    Wang, Hong
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 229
  • [22] NONPARAMETRIC INDEPENDENCE SCREENING AND STRUCTURE IDENTIFICATION FOR ULTRA-HIGH DIMENSIONAL LONGITUDINAL DATA
    Cheng, Ming-Yen
    Honda, Toshio
    Li, Jialiang
    Peng, Heng
    ANNALS OF STATISTICS, 2014, 42 (05): : 1819 - 1849
  • [23] Combined performance of screening and variable selection methods in ultra-high dimensional data in predicting time-to-event outcomes
    Lira Pi
    Susan Halabi
    Diagnostic and Prognostic Research, 2 (1)
  • [24] A sure independence screening procedure for ultra-high dimensional partially linear additive models
    Kazemi, M.
    Shahsavani, D.
    Arashi, M.
    JOURNAL OF APPLIED STATISTICS, 2019, 46 (08) : 1385 - 1403
  • [25] Variable selection for ultra-high dimensional quantile regression with missing data and measurement error
    Bai, Yongxin
    Tian, Maozai
    Tang, Man-Lai
    Lee, Wing-Yan
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 129 - 150
  • [26] Joint model-free feature screening for ultra-high dimensional semi-competing risks data
    Lu, Shuiyun
    Chen, Xiaolin
    Xu, Sheng
    Liu, Chunling
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 147
  • [27] Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data
    Yu, Yuan
    He, Di
    Zhou, Yong
    STATISTICS AND ITS INTERFACE, 2018, 11 (03) : 473 - 489
  • [28] A new robust model-free feature screening method for ultra-high dimensional right censored data
    Liu, Yi
    Chen, Xiaolin
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (06) : 1857 - 1875
  • [29] Improvement Screening for Ultra-High Dimensional Data with Censored Survival Outcomes and Varying Coefficients
    Yue, Mu
    Li, Jialiang
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01):
  • [30] Conditional distance correlation sure independence screening for ultra-high dimensional survival data
    Lu, Shuiyun
    Chen, Xiaolin
    Wang, Hong
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (08) : 1936 - 1953