Interpretable Deep Learning under Fire

被引:0
|
作者
Zhang, Xinyang [1 ]
Wang, Ningfei [2 ]
Shen, Hua [1 ]
Ji, Shouling [3 ,4 ]
Luo, Xiapu [5 ]
Wang, Ting [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Univ Calif Irvine, Irvine, CA USA
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Alibaba, ZJU Joint Inst Frontier Technol, Hangzhou, Peoples R China
[5] Hong Kong Polytech Univ, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Providing explanations for deep neural network (DNN) models is crucial for their use in security-sensitive domains. A plethora of interpretation models have been proposed to help users understand the inner workings of DNNs: how does a DNN arrive at a specific decision for a given input? The improved interpretability is believed to offer a sense of security by involving human in the decision-making process. Yet, due to its data-driven nature, the interpretability itself is potentially susceptible to malicious manipulations, about which little is known thus far. Here we bridge this gap by conducting the first systematic study on the security of interpretable deep learning systems (IDLSes). We show that existing IDLSes are highly vulnerable to adversarial manipulations. Specifically, we present ADV2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models. Through empirical evaluation against four major types of IDLSes on benchmark datasets and in security-critical applications (e.g., skin cancer diagnosis), we demonstrate that with ADV2 the adversary is able to arbitrarily designate an input's prediction and interpretation. Further, with both analytical and empirical evidence, we identify the prediction-interpretation gap as one root cause of this vulnerability - a DNN and its interpretation model are often misaligned, resulting in the possibility of exploiting both models simultaneously. Finally, we explore potential countermeasures against ADV2, including leveraging its low transferability and incorporating it in an adversarial training framework. Our findings shed light on designing and operating IDLSes in a more secure and informative fashion, leading to several promising research directions.
引用
收藏
页码:1659 / 1676
页数:18
相关论文
共 50 条
  • [31] A Novel Interpretable Deep Learning Model for Ozone Prediction
    Chen, Xingguo
    Li, Yang
    Xu, Xiaoyan
    Shao, Min
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [32] Clinical Interpretable Deep Learning Model for Glaucoma Diagnosis
    Liao, WangMin
    Zou, BeiJi
    Zhao, RongChang
    Chen, YuanQiong
    He, ZhiYou
    Zhou, MengJie
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (05) : 1405 - 1412
  • [33] Towards Interpretable Deep Learning Models for Knowledge Tracing
    Lu, Yu
    Wang, Deliang
    Meng, Qinggang
    Chen, Penghe
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT II, 2020, 12164 : 185 - 190
  • [34] Interpretable patent recommendation with knowledge graph and deep learning
    Han Chen
    Weiwei Deng
    Scientific Reports, 13
  • [35] Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond
    Li, Xuhong
    Xiong, Haoyi
    Li, Xingjian
    Wu, Xuanyu
    Zhang, Xiao
    Liu, Ji
    Bian, Jiang
    Dou, Dejing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (12) : 3197 - 3234
  • [36] Using interpretable deep learning to model cancer dependencies
    Lin, Chih-Hsu
    Lichtarge, Olivier
    BIOINFORMATICS, 2021, 37 (17) : 2675 - 2681
  • [37] Fully interpretable deep learning model of transcriptional control
    Liu, Yi
    Barr, Kenneth
    Reinitz, John
    BIOINFORMATICS, 2020, 36 : 499 - 507
  • [38] Interpretable Deep Learning for Spatial Analysis of Severe Hailstorms
    Gagne, David John, II
    Haupt, Sue Ellen
    Nychka, Douglas W.
    Thompson, Gregory
    MONTHLY WEATHER REVIEW, 2019, 147 (08) : 2827 - 2845
  • [39] Industry return prediction via interpretable deep learning
    Zografopoulos, Lazaros
    Iannino, Maria Chiara
    Psaradellis, Ioannis
    Sermpinis, Georgios
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2025, 321 (01) : 257 - 268
  • [40] Gearbox fault diagnosis based on temporal shrinkage interpretable deep reinforcement learning under strong noise
    Wei, Zeqi
    Wang, Hui
    Zhao, Zhibin
    Zhou, Zheng
    Yan, Ruqiang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 139