Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

被引:6
|
作者
Chen Z. [1 ]
Silvestri F. [2 ]
Tolomei G. [2 ]
Wang J. [3 ]
Zhu H. [4 ]
Ahn H. [1 ]
机构
[1] Stony Brook University, Department of Applied Mathematics and Statistics, Stony Brook, 11794, NY
[2] Sapienza University of Rome, Department of Computer Engineering, The Department of Computer Science, Rome
[3] Xi'An Jiaotong-Liverpool University, Department of Intelligent Science, Suzhou
[4] Rutgers University-New Brunswick, Department of Computer Science, Piscataway, 08854, NJ
来源
关键词
Counterfactual explanations; deep reinforcement learning (DRL); explainable artificial intelligence (XAI); machine learning (ML) explainability;
D O I
10.1109/TAI.2022.3223892
中图分类号
学科分类号
摘要
Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases. © 2020 IEEE.
引用
收藏
页码:1443 / 1457
页数:14
相关论文
共 50 条
  • [21] Model-Agnostic Federated Learning
    Mittone, Gianluca
    Riviera, Walter
    Colonnelli, Iacopo
    Birke, Robert
    Aldinucci, Marco
    EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 383 - 396
  • [22] Model-Agnostic Private Learning
    Bassily, Raef
    Thakkar, Om
    Thakurta, Abhradeep
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [23] Counterfactual state explanations for reinforcement learning agents via generative deep learning
    Olson, Matthew L.
    Khanna, Roli
    Neal, Lawrence
    Li, Fuxin
    Wong, Weng-Keen
    ARTIFICIAL INTELLIGENCE, 2021, 295
  • [24] Model-agnostic and diverse explanations for streaming rumour graphs
    Nguyen, Thanh Tam
    Phan, Thanh Cong
    Nguyen, Minh Hieu
    Weidlich, Matthias
    Yin, Hongzhi
    Jo, Jun
    Nguyen, Quoc Viet Hung
    KNOWLEDGE-BASED SYSTEMS, 2022, 253
  • [25] Anchors: High-Precision Model-Agnostic Explanations
    Ribeiro, Marco Tulio
    Singh, Sameer
    Guestrin, Carlos
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 1527 - 1535
  • [26] Model-Agnostic Explanations using Minimal Forcing Subsets
    Han, Xing
    Ghosh, Joydeep
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [27] Deep Learning Model-Agnostic Controller for VTOL Class UAS
    Holmes, Grant
    Chowdhury, Mozammal
    McKinnis, Aaron
    Keshmiri, Shawn
    2022 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2022, : 1520 - 1529
  • [28] SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
    Samadi, Amir
    Koufos, Konstantinos
    Debattista, Kurt
    Dianati, Mehrdad
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9994 - 10001
  • [29] Model-Agnostic Explanations for Decisions Using Minimal Patterns
    Asano, Kohei
    Chun, Jinhee
    Koike, Atsushi
    Tokuyama, Takeshi
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 241 - 252
  • [30] LIVE: A Local Interpretable model-agnostic Visualizations and Explanations
    Shi, Peichang
    Gangopadhyay, Aryya
    Yu, Ping
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 245 - 254