In-Training Explainability Frameworks: A Method to Make Black-Box Machine Learning Models More Explainable

被引:0
|
作者
Acun, Cagla [1 ]
Nasraoui, Olfa [1 ]
机构
[1] Univ Louisville, Web Min & Knowledge Discovery Lab, Louisville, KY 40292 USA
关键词
Explainability in Artificial Intelligence; XAI;
D O I
10.1109/WI-IAT59888.2023.00036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite ongoing efforts to make black-box machine learning models more explainable, transparent, and trustworthy, there is a growing advocacy for using only inherently interpretable models for high-stake decision making. For instance, post-hoc explanations have recently been criticized because they learn surrogate white-box (explainer) models that, while optimized to approximate the original predictive model, remain different from the latter. Moreover, the post-hoc models necessitate a post-hoc training phase at prediction time, that adds to the computational burden. In this paper, we propose two novel explainability approaches that make black-box models more explainable, which we call pre-hoc explainability and co-hoc explainability. Our goal is to maintain the black-box model's prediction accuracy while benefiting from the explanations that come with an inherently interpretable white-box model, and without the need for a post-hoc training phase at prediction time. In contrast to post-hoc methods, the black-box model training phase is guided by explanations that are used as a regularizer. Our experiments demonstrate the advantages of our proposed technique on three real-life datasets, in terms of fidelity, without compromising accuracy.
引用
收藏
页码:230 / 237
页数:8
相关论文
共 50 条
  • [31] Explaining Artificial Intelligence with CareAnalyzing the Explainability of Black Box Multiclass Machine Learning Models in Forensics
    Gero Szepannek
    Karsten Lübke
    KI - Künstliche Intelligenz, 2022, 36 : 125 - 134
  • [32] Practical Black-Box Attacks against Machine Learning
    Papernot, Nicolas
    McDaniel, Patrick
    Goodfellow, Ian
    Jha, Somesh
    Celik, Z. Berkay
    Swami, Ananthram
    PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 506 - 519
  • [33] Machine Learning for Black-Box Fuzzing of Network Protocols
    Fan, Rong
    Chang, Yaoyao
    INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2017, 2018, 10631 : 621 - 632
  • [34] Users' trust in black-box machine learning algorithms
    Nakashima, Heitor Hoffman
    Mantovani, Daielly
    Machado Junior, Celso
    REGE-REVISTA DE GESTAO, 2024, 31 (02): : 237 - 250
  • [35] Evaluation of white-box versus black-box machine learning models in estimating ambient black carbon concentration
    Fung, Pak L.
    Zaidan, Martha A.
    Timonen, Hilkka
    Niemi, Jarkko, V
    Kousa, Anu
    Kuula, Joel
    Luoma, Krista
    Tarkoma, Sasu
    Petaja, Tuukka
    Kulmala, Markku
    Hussein, Tareq
    JOURNAL OF AEROSOL SCIENCE, 2021, 152
  • [36] A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models
    Guo, Sensen
    Zhao, Jinxiong
    Li, Xiaoyu
    Duan, Junhong
    Mu, Dejun
    Jing, Xiao
    SECURITY AND COMMUNICATION NETWORKS, 2021, 2021
  • [37] B3: Backdoor Attacks against Black-box Machine Learning Models
    Gong, Xueluan
    Chen, Yanjiao
    Yang, Wenbin
    Huang, Huayang
    Wang, Qian
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2023, 26 (04)
  • [38] Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks
    Feng, Ryan
    Hooda, Ashish
    Mangaokar, Neal
    Fawaz, Kassem
    Jha, Somesh
    Prakash, Atul
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 786 - 800
  • [39] Drift Detection for Black-Box Deep Learning Models
    Piano, Luca
    Garcea, Fabio
    Cavallone, Andrea
    Vazquez, Ignacio Aparicio
    Morra, Lia
    Lamberti, Fabrizio
    IT PROFESSIONAL, 2024, 26 (02) : 24 - 31
  • [40] Learning outside the Black-Box: The pursuit of interpretable models
    Crabbe, Jonathan
    Zhang, Yao
    Zame, William R.
    van der Schaar, Mihaela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33