Cross-Validation With Confidence

被引：47

作者：

Lei, Jing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2020年 / 115卷 / 532期

关键词：

Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;

D O I：

10.1080/01621459.2019.1672556

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

引用

页码：1978 / 1997

页数：20

共 50 条

[21] Experience with a cross-validation approach
D. Gansser
Chromatographia, 2002, 55 : S71 - S74
[22] The uncertainty principle of cross-validation
Last, Mark
2006 IEEE International Conference on Granular Computing, 2006, : 275 - 280
[23] No free lunch for cross-validation
Zhu, HY
Rohwer, R
NEURAL COMPUTATION, 1996, 8 (07) : 1421 - 1426
[24] Generalised correlated cross-validation
Carmack, Patrick S.
Spence, Jeffrey S.
Schucany, William R.
JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (02) : 269 - 282
[25] CROSS-VALIDATION OF BOOTH SCALE
KROLL, W
PETERSEN, KH
RESEARCH QUARTERLY, 1966, 37 (01): : 66 - 70
[26] VRB APPARATUS - A CROSS-VALIDATION
SCHUBERT, J
PSYCHOLOGICAL REPORTS, 1970, 27 (02) : 571 - &
[27] Cross-validation sample sizes
Algina, J
Keselman, HJ
APPLIED PSYCHOLOGICAL MEASUREMENT, 2000, 24 (02) : 173 - 179
[28] On the marginal likelihood and cross-validation
Fong, E.
Holmes, C. C.
BIOMETRIKA, 2020, 107 (02) : 489 - 496
[29] A LOCAL CROSS-VALIDATION ALGORITHM
HALL, P
SCHUCANY, WR
STATISTICS & PROBABILITY LETTERS, 1989, 8 (02) : 109 - 117
[30] RORSCHACH RELIABILITY - CROSS-VALIDATION
DECATO, CM
PERCEPTUAL AND MOTOR SKILLS, 1983, 56 (01) : 11 - 14

← 1 2 3 4 5 →