Interpretable regression trees using conformal prediction

被引:28
|
作者
Johansson, Ulf [1 ,2 ]
Linusson, Henrik [2 ]
Lofstrom, Tuve [1 ,2 ]
Bostromc, Henrik [3 ]
机构
[1] Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden
[2] Univ Boras, Dept Informat Technol, Boras, Sweden
[3] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
关键词
Conformal prediction; Interpretability; Predictive regression; Regression trees; ALGORITHMS;
D O I
10.1016/j.eswa.2017.12.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key property of conformal predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset level of confidence. For regression, this is achieved by turning the point predictions of the underlying model into prediction intervals. Thus, the most important performance metric for evaluating conformal regressors is not the error rate, but the size of the prediction intervals, where models generating smaller (more informative) intervals are said to be more efficient. State-of-the-art conformal regressors typically utilize two separate predictive models: the underlying model providing the center point of each prediction interval, and a normalization model used to scale each prediction interval according to the estimated level of difficulty for each test instance. When using a regression tree as the underlying model, this approach may cause test instances falling into a specific leaf to receive different prediction intervals. This clearly deteriorates the interpretability of a conformal regression tree compared to a standard regression tree, since the path from the root to a leaf can no longer be translated into a rule explaining all predictions in that leaf. In fact, the model cannot even be interpreted on its own, i.e., without reference to the corresponding normalization model. Current practice effectively presents two options for constructing conformal regression trees: to employ a (global) normalization model, and thereby sacrifice interpretability; or to avoid normalization, and thereby sacrifice both efficiency and individualized predictions. In this paper, two additional approaches are considered, both employing local normalization: the first approach estimates the difficulty by the standard deviation of the target values in each leaf, while the second approach employs Mondrian conformal prediction, which results in regression trees where each rule (path from root node to leaf node) is independently valid. An empirical evaluation shows that the first approach is as efficient as current state-of-the-art approaches, thus eliminating the efficiency vs. interpretability trade-off present in existing methods. Moreover, it is shown that if a validity guarantee is required for each single rule, as provided by the Mondrian approach, a penalty with respect to efficiency has to be paid, but it is only substantial at very high confidence levels. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:394 / 404
页数:11
相关论文
共 50 条
  • [21] COMPARATIVE STUDY OF BIODEGRADABILITY PREDICTION OF CHEMICALS USING DECISION TREES, FUNCTIONAL TREES, AND LOGISTIC REGRESSION
    Chen, Guangchao
    Li, Xuehua
    Chen, Jingwen
    Zhang, Ya-nan
    Peijnenburg, Willie J. G. M.
    ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY, 2014, 33 (12) : 2688 - 2693
  • [22] Improved prediction of radiation pneumonitis using multiple additive regression trees
    Das, SK
    Zhou, S
    Kocak, Z
    Yin, F
    Marks, L
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2005, 63 (02): : S228 - S229
  • [23] Prediction of enantio selectivity using chirality codes and Classification and Regression Trees
    Caetano, S
    Aires-De-Sousa, J
    Daszykowski, M
    Heyden, YV
    ANALYTICA CHIMICA ACTA, 2005, 544 (1-2) : 315 - 326
  • [24] Genome-wide prediction using Bayesian additive regression trees
    Waldmann, Patrik
    GENETICS SELECTION EVOLUTION, 2016, 48
  • [25] Genome-wide prediction using Bayesian additive regression trees
    Patrik Waldmann
    Genetics Selection Evolution, 48
  • [26] Interpretable clustering using unsupervised binary trees
    Ricardo Fraiman
    Badih Ghattas
    Marcela Svarc
    Advances in Data Analysis and Classification, 2013, 7 : 125 - 145
  • [27] Interpretable clustering using unsupervised binary trees
    Fraiman, Ricardo
    Ghattas, Badih
    Svarc, Marcela
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2013, 7 (02) : 125 - 145
  • [28] Using Decision Trees for Interpretable Supervised Clustering
    Kokash N.
    Makhnist L.
    SN Computer Science, 5 (2)
  • [29] Interpretable deep learning based text regression for financial prediction
    Liang, Rufeng
    Zhang, Weiwen
    Ye, Haiming
    EXPERT SYSTEMS, 2023, 40 (09)
  • [30] Prediction of Accrual Expenses in Balance Sheet Using Decision Trees and Linear Regression
    Wang, Chih-Yu
    Lin, Ming-Yen
    2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2016, : 73 - 77