Interpretable Deep Learning under Fire

被引:0
|
作者
Zhang, Xinyang [1 ]
Wang, Ningfei [2 ]
Shen, Hua [1 ]
Ji, Shouling [3 ,4 ]
Luo, Xiapu [5 ]
Wang, Ting [1 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Univ Calif Irvine, Irvine, CA USA
[3] Zhejiang Univ, Hangzhou, Peoples R China
[4] Alibaba, ZJU Joint Inst Frontier Technol, Hangzhou, Peoples R China
[5] Hong Kong Polytech Univ, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Providing explanations for deep neural network (DNN) models is crucial for their use in security-sensitive domains. A plethora of interpretation models have been proposed to help users understand the inner workings of DNNs: how does a DNN arrive at a specific decision for a given input? The improved interpretability is believed to offer a sense of security by involving human in the decision-making process. Yet, due to its data-driven nature, the interpretability itself is potentially susceptible to malicious manipulations, about which little is known thus far. Here we bridge this gap by conducting the first systematic study on the security of interpretable deep learning systems (IDLSes). We show that existing IDLSes are highly vulnerable to adversarial manipulations. Specifically, we present ADV2, a new class of attacks that generate adversarial inputs not only misleading target DNNs but also deceiving their coupled interpretation models. Through empirical evaluation against four major types of IDLSes on benchmark datasets and in security-critical applications (e.g., skin cancer diagnosis), we demonstrate that with ADV2 the adversary is able to arbitrarily designate an input's prediction and interpretation. Further, with both analytical and empirical evidence, we identify the prediction-interpretation gap as one root cause of this vulnerability - a DNN and its interpretation model are often misaligned, resulting in the possibility of exploiting both models simultaneously. Finally, we explore potential countermeasures against ADV2, including leveraging its low transferability and incorporating it in an adversarial training framework. Our findings shed light on designing and operating IDLSes in a more secure and informative fashion, leading to several promising research directions.
引用
收藏
页码:1659 / 1676
页数:18
相关论文
共 50 条
  • [1] Interpretable deep learning methods for multiview learning
    Wang, Hengkang
    Lu, Han
    Sun, Ju
    Safo, Sandra E.
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [2] iCVM: An Interpretable Deep Learning Model for CVM Assessment Under Label Uncertainty
    Liao, Ni
    Dai, Jian
    Tang, Yao
    Zhong, Qiaoyong
    Mo, Shuixue
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (08) : 4325 - 4334
  • [3] Monkeypox Diagnosis With Interpretable Deep Learning
    Ahsan, Md. Manjurul
    Ali, Md. Shahin
    Hassan, Md. Mehedi
    Abdullah, Tareque Abu
    Gupta, Kishor Datta
    Bagci, Ulas
    Kaushal, Chetna
    Soliman, Naglaa F.
    IEEE ACCESS, 2023, 11 : 81965 - 81980
  • [4] Fire Prediction and Risk Identification With Interpretable Machine Learning
    Dai, Shan
    Zhang, Jiayu
    Huang, Zhelin
    Zeng, Shipei
    JOURNAL OF FORECASTING, 2025,
  • [5] An Interpretable Deep Learning Method for Identifying Extreme Events under Faulty Data Interference
    Guo, Jiaxing
    Tang, Zhiyi
    Zhang, Changxing
    Xu, Wei
    Wu, Yonghong
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [6] Deep Staging: An Interpretable Deep Learning Framework for Disease Staging
    Yao, Liuyi
    Yao, Zijun
    Hu, Jianying
    Gao, Jing
    Sun, Zhaonan
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 130 - 137
  • [7] Interpretable Deep Learning for Marble Tiles Sorting
    Ouzounis, Athanasios G.
    Sidiropoulos, George K.
    Papakostas, George A.
    Sarafis, Ilias T.
    Stamkos, Andreas
    Solakis, George
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2021, : 101 - 108
  • [8] Interpretable Deep Learning for Surgical Tool Management
    Rodrigues, Mark
    Mayo, Michael
    Patros, Panos
    INTERPRETABILITY OF MACHINE INTELLIGENCE IN MEDICAL IMAGE COMPUTING, AND TOPOLOGICAL DATA ANALYSIS AND ITS APPLICATIONS FOR MEDICAL DATA, 2021, 12929 : 3 - 12
  • [9] Towards an interpretable deep learning model of cancer
    Nilsson, Avlant
    Meimetis, Nikolaos
    Lauffenburger, Douglas A.
    NPJ PRECISION ONCOLOGY, 2025, 9 (01)
  • [10] Interpretable Deep Learning for Monitoring Combustion Instability
    Gangopadhyay, Tryambak
    Tan, Sin Yong
    LoCurto, Anthony
    Michael, James B.
    Sarkar, Soumik
    IFAC PAPERSONLINE, 2020, 53 (02): : 832 - 837