Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination

被引:12
|
作者
Gneiting, Tilmann [1 ,2 ]
Resin, Johannes [1 ,2 ,3 ]
机构
[1] Heidelberg Inst Theoret Studies, Computat Stat, Heidelberg, Germany
[2] Karlsruher Inst Technol KIT, Inst Stochast, Karlsruhe, Germany
[3] Heidelberg Univ, Alfred Weber Inst Econ, Heidelberg, Germany
来源
ELECTRONIC JOURNAL OF STATISTICS | 2023年 / 17卷 / 02期
关键词
Calibration test; canonical loss; consistent scor-ing function; model diagnostics; nonparametric isotonic regression; pre-quential principle; score decomposition; skill score; DENSITY FORECASTS; SKILL SCORES; DECOMPOSITION; REPRESENTATIONS; ELICITABILITY; INFERENCE; TESTS; SET;
D O I
10.1214/23-EJS2180
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept of reliability. For real-valued outcomes, a general theory of calibration has been elusive, despite a recent surge of interest in distributional regression and machine learning. We develop a framework rooted in probability theory, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. In a nutshell, a prediction is conditionally T-calibrated if it can be taken at face value in terms of an identifiable functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration, discrimination, and uncertainty. In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination that nests and reinterprets the classical R2 in least squares regression and its natural analog R1 in quantile regression, yet applies to T-regression in general.
引用
收藏
页码:3226 / 3286
页数:61
相关论文
共 7 条