Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning Agent

被引：6

作者：

Chen Z. ^{[1
]}

Silvestri F. ^{[2
]}

Tolomei G. ^{[2
]}

Wang J. ^{[3
]}

Zhu H. ^{[4
]}

Ahn H. ^{[1
]}

机构：

[1] Stony Brook University, Department of Applied Mathematics and Statistics, Stony Brook, 11794, NY

[2] Sapienza University of Rome, Department of Computer Engineering, The Department of Computer Science, Rome

[3] Xi'An Jiaotong-Liverpool University, Department of Intelligent Science, Suzhou

[4] Rutgers University-New Brunswick, Department of Computer Science, Piscataway, 08854, NJ

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 04期

关键词：

Counterfactual explanations; deep reinforcement learning (DRL); explainable artificial intelligence (XAI); machine learning (ML) explainability;

D O I：

10.1109/TAI.2022.3223892

中图分类号：

学科分类号：

摘要：

Counterfactual examples (CFs) are one of the most popular methods for attaching post hoc explanations to machine learning models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood; thus, they are hard to generalize for complex models and inefficient for large datasets. This article aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task. We then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. In addition, we develop a distillation algorithm to extract decision rules from the DRL agent's policy in the form of a decision tree to make the process of generating CFs itself interpretable. Extensive experiments conducted on six tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both the classification and regression tasks. Finally, we show the ability of our method to provide actionable recommendations and distill interpretable policy explanations in two practical real-world use cases. © 2020 IEEE.

引用

页码：1443 / 1457

页数：14

共 50 条

[21] Model-Agnostic Federated Learning
Mittone, Gianluca
Riviera, Walter
Colonnelli, Iacopo
Birke, Robert
Aldinucci, Marco
EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 383 - 396
[22] Model-Agnostic Private Learning
Bassily, Raef
Thakkar, Om
Thakurta, Abhradeep
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[23] Counterfactual state explanations for reinforcement learning agents via generative deep learning
Olson, Matthew L.
Khanna, Roli
Neal, Lawrence
Li, Fuxin
Wong, Weng-Keen
ARTIFICIAL INTELLIGENCE, 2021, 295
[24] Model-agnostic and diverse explanations for streaming rumour graphs
Nguyen, Thanh Tam
Phan, Thanh Cong
Nguyen, Minh Hieu
Weidlich, Matthias
Yin, Hongzhi
Jo, Jun
Nguyen, Quoc Viet Hung
KNOWLEDGE-BASED SYSTEMS, 2022, 253
[25] Anchors: High-Precision Model-Agnostic Explanations
Ribeiro, Marco Tulio
Singh, Sameer
Guestrin, Carlos
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 1527 - 1535
[26] Model-Agnostic Explanations using Minimal Forcing Subsets
Han, Xing
Ghosh, Joydeep
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[27] Deep Learning Model-Agnostic Controller for VTOL Class UAS
Holmes, Grant
Chowdhury, Mozammal
McKinnis, Aaron
Keshmiri, Shawn
2022 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2022, : 1520 - 1529
[28] SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
Samadi, Amir
Koufos, Konstantinos
Debattista, Kurt
Dianati, Mehrdad
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (11): : 9994 - 10001
[29] Model-Agnostic Explanations for Decisions Using Minimal Patterns
Asano, Kohei
Chun, Jinhee
Koike, Atsushi
Tokuyama, Takeshi
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 241 - 252
[30] LIVE: A Local Interpretable model-agnostic Visualizations and Explanations
Shi, Peichang
Gangopadhyay, Aryya
Yu, Ping
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 245 - 254

← 1 2 3 4 5 →