Cross-Validation With Confidence

被引:47
|
作者
Lei, Jing [1 ]
机构
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
关键词
Cross-validation; Hypothesis testing; Model selection; Overfitting; Tuning parameter selection; TUNING PARAMETER SELECTION; MODEL SELECTION; LASSO;
D O I
10.1080/01621459.2019.1672556
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
引用
收藏
页码:1978 / 1997
页数:20
相关论文
共 50 条
  • [41] ASYMPTOTICS FOR AND AGAINST CROSS-VALIDATION
    STONE, M
    BIOMETRIKA, 1977, 64 (01) : 29 - 35
  • [42] A THEORY OF CROSS-VALIDATION ERROR
    TURNEY, P
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 1994, 6 (04) : 361 - 391
  • [43] Experience with a cross-validation approach
    Gansser, D
    CHROMATOGRAPHIA, 2002, 55 (Suppl 1) : S71 - S74
  • [44] Median cross-validation criterion
    Yang, Y
    CHINESE SCIENCE BULLETIN, 1997, 42 (23): : 1956 - 1959
  • [45] Cross-Validation for Correlated Data
    Rabinowicz, Assaf
    Rosset, Saharon
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (538) : 718 - 731
  • [46] CROSS-VALIDATION AND MULTINOMIAL PREDICTION
    STONE, M
    BIOMETRIKA, 1974, 61 (03) : 509 - 515
  • [47] Cross-validation and median criterion
    Zheng, ZG
    Yang, Y
    STATISTICA SINICA, 1998, 8 (03) : 907 - 921
  • [48] Cross-validation is safe to use
    King, Ross D.
    Orhobor, Oghenejokpeme I.
    Taylor, Charles C.
    NATURE MACHINE INTELLIGENCE, 2021, 3 (04) : 276 - 276
  • [49] Cross-validation is safe to use
    Ross D. King
    Oghenejokpeme I. Orhobor
    Charles C. Taylor
    Nature Machine Intelligence, 2021, 3 : 276 - 276
  • [50] Linear unlearning for cross-validation
    Hansen, LK
    Larsen, J
    ADVANCES IN COMPUTATIONAL MATHEMATICS, 1996, 5 (2-3) : 269 - 280