Interpretable regression trees using conformal prediction

被引:28
|
作者
Johansson, Ulf [1 ,2 ]
Linusson, Henrik [2 ]
Lofstrom, Tuve [1 ,2 ]
Bostromc, Henrik [3 ]
机构
[1] Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden
[2] Univ Boras, Dept Informat Technol, Boras, Sweden
[3] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
关键词
Conformal prediction; Interpretability; Predictive regression; Regression trees; ALGORITHMS;
D O I
10.1016/j.eswa.2017.12.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key property of conformal predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset level of confidence. For regression, this is achieved by turning the point predictions of the underlying model into prediction intervals. Thus, the most important performance metric for evaluating conformal regressors is not the error rate, but the size of the prediction intervals, where models generating smaller (more informative) intervals are said to be more efficient. State-of-the-art conformal regressors typically utilize two separate predictive models: the underlying model providing the center point of each prediction interval, and a normalization model used to scale each prediction interval according to the estimated level of difficulty for each test instance. When using a regression tree as the underlying model, this approach may cause test instances falling into a specific leaf to receive different prediction intervals. This clearly deteriorates the interpretability of a conformal regression tree compared to a standard regression tree, since the path from the root to a leaf can no longer be translated into a rule explaining all predictions in that leaf. In fact, the model cannot even be interpreted on its own, i.e., without reference to the corresponding normalization model. Current practice effectively presents two options for constructing conformal regression trees: to employ a (global) normalization model, and thereby sacrifice interpretability; or to avoid normalization, and thereby sacrifice both efficiency and individualized predictions. In this paper, two additional approaches are considered, both employing local normalization: the first approach estimates the difficulty by the standard deviation of the target values in each leaf, while the second approach employs Mondrian conformal prediction, which results in regression trees where each rule (path from root node to leaf node) is independently valid. An empirical evaluation shows that the first approach is as efficient as current state-of-the-art approaches, thus eliminating the efficiency vs. interpretability trade-off present in existing methods. Moreover, it is shown that if a validity guarantee is required for each single rule, as provided by the Mondrian approach, a penalty with respect to efficiency has to be paid, but it is only substantial at very high confidence levels. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:394 / 404
页数:11
相关论文
共 50 条
  • [31] A prediction system for bike sharing using artificial immune system with regression trees
    Wu, Jheng-Long
    Chang, Pei-Chann
    2015 IIAI 4TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2015, : 511 - 516
  • [32] Prediction intervals for global solar irradiation forecasting using regression trees methods
    Voyant, Cyril
    Motte, Fabrice
    Notton, Gilles
    Fouilloy, Alexis
    Nivet, Marie-Laure
    Duchaud, Jean-Laurent
    RENEWABLE ENERGY, 2018, 126 : 332 - 340
  • [33] Prediction of Survival to Discharge Following Cardiopulmonary Resuscitation Using Classification and Regression Trees
    Ebell, Mark H.
    Afonso, Anna M.
    Geocadin, Romergryko G.
    CRITICAL CARE MEDICINE, 2013, 41 (12) : 2688 - 2697
  • [34] PREDICTION OF CANNABIS AND COCAINE USE IN ADOLESCENCE USING DECISION TREES AND LOGISTIC REGRESSION
    Gervilla, Elena
    Palmer, Alfonso
    EUROPEAN JOURNAL OF PSYCHOLOGY APPLIED TO LEGAL CONTEXT, 2010, 2 (01): : 19 - 35
  • [35] Concrete performance prediction using boosting smooth transition regression trees (BooST)
    Anyaoha, Uchenna
    Peng, Xiang
    Liu, Zheng
    NONDESTRUCTIVE CHARACTERIZATION AND MONITORING OF ADVANCED MATERIALS, AEROSPACE, CIVIL INFRASTRUCTURE, AND TRANSPORTATION XIII, 2019, 10971
  • [36] Short-Term Traffic Volume Prediction Using Classification and Regression Trees
    Xu, Yanyan
    Kong, Qing-Jie
    Liu, Yuncai
    2013 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2013, : 493 - 498
  • [37] Regression trees for fast and adaptive prediction intervals
    Cabezas, Luben M. C.
    Otto, Mateus P.
    Izbicki, Rafael
    Stern, Rafael B.
    INFORMATION SCIENCES, 2025, 686
  • [38] Optimal Interpretable Clustering Using Oblique Decision Trees
    Gabidolla, Magzhan
    Carreira-Perpinan, Miguel A.
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 400 - 410
  • [39] Prediction in medicine by integrating regression trees into regression analysis with optimal scaling
    Dusseldorp, E
    Meulman, JJ
    METHODS OF INFORMATION IN MEDICINE, 2001, 40 (05) : 403 - 409
  • [40] Interpretable and Specialized Conformal Predictors
    Johansson, Ulf
    Lofstrom, Tuwe
    Bostrom, Henrik
    Sonstrod, Cecilia
    CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 105, 2019, 105