Two Tales of Variable Selection for High Dimensional Regression: Screening and Model Building

被引：6

作者：

Liu, Cong ^{[1
]}

Shi, Tao ^{[2
]}

Lee, Yoonkyung ^{[2
]}

机构：

[1] Amazon Com Inc, Seattle, WA 98109 USA

[2] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA

来源：

STATISTICAL ANALYSIS AND DATA MINING | 2014年 / 7卷 / 02期

基金：

美国国家科学基金会;

关键词：

cross-validation; forward selection; LASSO; ROC curve; SCAD; SIS; NONCONCAVE PENALIZED LIKELIHOOD; ORACLE PROPERTIES; DIVERGING NUMBER; ADAPTIVE LASSO; ELASTIC-NET; PARAMETERS;

D O I：

10.1002/sam.11219

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Variable selection plays an important role in high-dimensional regression problems where a large number of variables are given as potential predictors of a response of interest. Typically, it arises at two stages of statistical modeling, namely screening and formal model building, with different goals. Screening aims at filtering out irrelevant variables prior to model building where a formal description of a functional relation between the variables screened for relevance and the response is sought. Accordingly, proper comparison of variable selection methods calls for evaluation criteria that reflect the differential goals: accuracy in ranking order of variables for screening and prediction accuracy for formal modeling. Without delineating the difference in the two aspects, confounding comparisons of various screening and selection methods have often been made in the literature, which may lead to misleading conclusions. In this paper, we present comprehensive numerical studies for comparison of four commonly used screening and selection procedures: correlation screening (also known as sure independence screening), forward selection, LASSO and SCAD. By clearly differentiating screening and model building, we highlight the situations where the performance of these procedures might differ. In addition, we propose a new method for cross-validation for LASSO. Furthermore, we discuss connections to relevant comparison studies that appeared in the recent literature to clarify different findings and conclusions. (C) 2014 Wiley Periodicals, Inc.

引用

页码：140 / 159

页数：20

共 50 条

[1] An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model
Lee, Sangin
COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2015, 22 (02) : 147 - 157
[2] Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression
Yau, P
Kohn, R
Wood, S
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (01) : 23 - 54
[3] Optimality of Graphlet Screening in high dimensional variable selection
Jin, Jiashun
Zhang, Cun-Hui
Zhang, Qi
Journal of Machine Learning Research, 2014, 15 : 2723 - 2772
[4] A stepwise regression algorithm for high-dimensional variable selection
Hwang, Jing-Shiang
Hu, Tsuey-Hwa
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
[5] Variable selection in censored quantile regression with high dimensional data
Yali Fan
Yanlin Tang
Zhongyi Zhu
Science China(Mathematics), 2018, 61 (04) : 641 - 658
[6] Variable selection in censored quantile regression with high dimensional data
Fan, Yali
Tang, Yanlin
Zhu, Zhongyi
SCIENCE CHINA-MATHEMATICS, 2018, 61 (04) : 641 - 658
[7] Variable Selection Diagnostics Measures for High-Dimensional Regression
Nan, Ying
Yang, Yuhong
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
[8] Variable selection in censored quantile regression with high dimensional data
Yali Fan
Yanlin Tang
Zhongyi Zhu
Science China Mathematics, 2018, 61 : 641 - 658
[9] Forward Regression for Ultra-High Dimensional Variable Screening
Wang, Hansheng
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2009, 104 (488) : 1512 - 1524
[10] FAITHFUL VARIABLE SCREENING FOR HIGH-DIMENSIONAL CONVEX REGRESSION
Xu, Min
Chen, Minhua
Lafferty, John
ANNALS OF STATISTICS, 2016, 44 (06): : 2624 - 2660

← 1 2 3 4 5 →