Stable and actionable explanations of black-box models through factual and counterfactual rules

被引：15

作者：

Guidotti, Riccardo ^{[1
]}

Monreale, Anna ^{[1
]}

Ruggieri, Salvatore ^{[1
]}

Naretto, Francesca ^{[2
]}

Turini, Franco ^{[1
]}

Pedreschi, Dino ^{[1
]}

Giannotti, Fosca ^{[2
]}

机构：

[1] Univ Pisa, Dept Comp Sci, Largo B Pontecorvo 3, I-56127 Pisa, PI, Italy

[2] Scuola Normale Super Pisa, Piazza Cavalieri 7, I-56126 Pisa, PI, Italy

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2024年 / 38卷 / 05期

基金：

英国工程与自然科学研究理事会; 欧盟地平线“2020”; 欧洲研究理事会;

关键词：

Explainable AI; Local explanations; Model-agnostic explanations; Rule-based explanations; Counterfactuals; INSTANCE SELECTION; ALGORITHMS;

D O I：

10.1007/s10618-022-00878-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. Explaining the decision taken by a black-box classifier on a specific input instance is therefore of striking interest. We propose a local rule-based model-agnostic explanation method providing stable and actionable explanations. An explanation consists of a factual logic rule, stating the reasons for the black-box decision, and a set of actionable counterfactual logic rules, proactively suggesting the changes in the instance that lead to a different outcome. Explanations are computed from a decision tree that mimics the behavior of the black-box locally to the instance to explain. The decision tree is obtained through a bagging-like approach that favors stability and fidelity: first, an ensemble of decision trees is learned from neighborhoods of the instance under investigation; then, the ensemble is merged into a single decision tree. Neighbor instances are synthetically generated through a genetic algorithm whose fitness function is driven by the black-box behavior. Experiments show that the proposed method advances the state-of-the-art towards a comprehensive approach that successfully covers stability and actionability of factual and counterfactual explanations.

引用

页码：2825 / 2862

页数：38

共 50 条

[41] BLACK-BOX ATTACKS ON IMAGE ACTIVITY PREDICTION AND ITS NATURAL LANGUAGE EXPLANATIONS
Baia, Alina Elena
Poggioni, Valentina
Cavallaro, Andrea
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3688 - 3697
[42] Black-box Adversarial Attacks on Video Recognition Models
Jiang, Linxi
Ma, Xingjun
Chen, Shaoxiang
Bailey, James
Jiang, Yu-Gang
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 864 - 872
[43] Black-Box Test Generation from Inferred Models
Papadopoulos, Petros
Walkinshaw, Neil
2015 IEEE/ACM FOURTH INTERNATIONAL WORKSHOP ON REALIZING ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING (RAISE 2015), 2015, : 19 - 24
[44] On the Impossibility of Virtual Black-Box Obfuscation in Idealized Models
Mahmoody, Mohammad
Mohammed, Ameer
Nematihaji, Soheil
THEORY OF CRYPTOGRAPHY, TCC 2016-A, PT I, 2016, 9562 : 18 - 48
[45] PLENARY: Explaining black-box models in natural language through fuzzy linguistic summaries
Kaczmarek-Majer, Katarzyna
Casalino, Gabriella
Castellano, Giovanna
Dominiak, Monika
Hryniewicz, Olgierd
Kaminska, Olga
Vessio, Gennaro
Diaz-Rodriguez, Natalia
INFORMATION SCIENCES, 2022, 614 : 374 - 399
[46] Capturing the form of feature interactions in black-box models
Zhang, Hanying
Zhang, Xiaohang
Zhang, Tianbo
Zhu, Ji
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (04)
[47] Explainable AI: To Reveal the Logic of Black-Box Models
Chinu, Urvashi
Bansal, Urvashi
NEW GENERATION COMPUTING, 2024, 42 (01) : 53 - 87
[48] One Max in Black-Box Models with Several Restrictions
Doerr, Carola
Lengler, Johannes
GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, : 1431 - 1438
[49] Fusing Independent Inferential Models in a Black-Box Manner
Cella, Leonardo
BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 189 - 196
[50] Learning outside the Black-Box: The pursuit of interpretable models
Crabbe, Jonathan
Zhang, Yao
Zame, William R.
van der Schaar, Mihaela
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →