Cross-Validation With Confidence

被引：47

作者：

Lei, Jing ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2020年 / 115卷 / 532期

关键词：

Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;

D O I：

10.1080/01621459.2019.1672556

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

引用

页码：1978 / 1997

页数：20

共 50 条

[41] ASYMPTOTICS FOR AND AGAINST CROSS-VALIDATION
STONE, M
BIOMETRIKA, 1977, 64 (01) : 29 - 35
[42] A THEORY OF CROSS-VALIDATION ERROR
TURNEY, P
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1994, 6 (04) : 361 - 391
[43] Experience with a cross-validation approach
Gansser, D
CHROMATOGRAPHIA, 2002, 55 (Suppl 1) : S71 - S74
[44] Median cross-validation criterion
Yang, Y
CHINESE SCIENCE BULLETIN, 1997, 42 (23): : 1956 - 1959
[45] Cross-Validation for Correlated Data
Rabinowicz, Assaf
Rosset, Saharon
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) : 718 - 731
[46] CROSS-VALIDATION AND MULTINOMIAL PREDICTION
STONE, M
BIOMETRIKA, 1974, 61 (03) : 509 - 515
[47] Cross-validation and median criterion
Zheng, ZG
Yang, Y
STATISTICA SINICA, 1998, 8 (03) : 907 - 921
[48] Cross-validation is safe to use
King, Ross D.
Orhobor, Oghenejokpeme I.
Taylor, Charles C.
NATURE MACHINE INTELLIGENCE, 2021, 3 (04) : 276 - 276
[49] Cross-validation is safe to use
Ross D. King
Oghenejokpeme I. Orhobor
Charles C. Taylor
Nature Machine Intelligence, 2021, 3 : 276 - 276
[50] Linear unlearning for cross-validation
Hansen, LK
Larsen, J
ADVANCES IN COMPUTATIONAL MATHEMATICS, 1996, 5 (2-3) : 269 - 280

← 1 2 3 4 5 →