Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation estimator. The main theorem shows that there exists no universal unbiased estimator of the variance of K-fold cross-validation. An analysis based on the eigendecomposition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive estimators may grossly underestimate variance, as conpoundrmed by numerical experiments.
机构:
Univ Sci & Technol China, Sch Management, Int Inst Finance, Hefei, Peoples R China
Chinese Acad Sci, Acad Math & Syst Sci, Beijing, Peoples R ChinaUniv Sci & Technol China, Sch Management, Int Inst Finance, Hefei, Peoples R China