Adversarial training and attribution methods enable evaluation of robustness and interpretability of deep learning models for image classification

被引:0
|
作者
Santos, Flavio A. O. [1 ]
Zanchettin, Cleber [1 ,2 ]
Lei, Weihua [3 ]
Amaral, Luis A. Nunes [2 ,3 ,4 ,5 ]
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-52061080 Recife, PE, Brazil
[2] Northwestern Univ, Dept Chem & Biol Engn, Evanston, IL 60208 USA
[3] Northwestern Univ, Dept Phys & Astron, Evanston, IL 60208 USA
[4] Northwestern Univ, Northwestern Inst Complex Syst, Evanston, IL 60208 USA
[5] Northwestern Univ, NSF Simons Natl Inst Theory & Math Biol, Chicago, IL 60611 USA
关键词
Compendex;
D O I
10.1103/PhysRevE.110.054310
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Deep learning models have achieved high performance in a wide range of applications. Recently, however, there have been increasing concerns about the fragility of many of those models to adversarial approaches and out-of-distribution inputs. A way to investigate and potentially address model fragility is to develop the ability to provide interpretability to model predictions. To this end, input attribution approaches such as Grad-CAM and integrated gradients have been introduced to address model interpretability. Here, we combine adversarial and input attribution approaches in order to achieve two goals. The first is to investigate the impact of adversarial approaches on input attribution. The second is to benchmark competing input attribution approaches. In the context of the image classification task, we find that models trained with adversarial approaches yield dramatically different input attribution matrices from those obtained using standard techniques for all considered input attribution approaches. Additionally, by evaluating the signal-(typical input attribution of the foreground)to-noise (typical input attribution of the background) ratio and correlating it to model confidence, we are able to identify the most reliable input attribution approaches and demonstrate that adversarial training does increase prediction robustness. Our approach can be easily extended to contexts other than the image classification task and enables users to increase their confidence in the reliability of deep learning models.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] CANARY: An Adversarial Robustness Evaluation Platform for Deep Learning Models on Image Classification
    Sun, Jiazheng
    Chen, Li
    Xia, Chenxiao
    Zhang, Da
    Huang, Rong
    Qiu, Zhi
    Xiong, Wenqi
    Zheng, Jun
    Tan, Yu-An
    ELECTRONICS, 2023, 12 (17)
  • [2] ADVERSARIAL ROBUSTNESS OF DEEP LEARNING METHODS FOR SAR IMAGE CLASSIFICATION: AN EXPLAINABILITY VIEW
    Chen, Tianrui
    Wu, Juanping
    Guo, Weiwei
    Zhang, Zenghui
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 1987 - 1991
  • [3] Survey on Interpretability of Deep Models for Image Classification
    Yang P.-B.
    Sang J.-T.
    Zhang B.
    Feng Y.-G.
    Yu J.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (01): : 230 - 254
  • [4] A Review of Adversarial Robustness Evaluation for Image Classification
    Li, Zituo
    Sun, Jianbin
    Yang, Kewei
    Xiong, Dehui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (10): : 2164 - 2189
  • [5] Impact of Attention on Adversarial Robustness of Image Classification Models
    Agrawal, Prachi
    Punn, Narinder Singh
    Sonbhadra, Sanjay Kumar
    Agarwal, Sonali
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3013 - 3019
  • [6] Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification
    Wang, Desheng
    Jin, Weidong
    Wu, Yunpu
    SENSORS, 2023, 23 (06)
  • [7] On the Robustness of Deep Learning Models to Universal Adversarial Attack
    Karim, Rezaul
    Islam, Md Amirul
    Mohammed, Noman
    Bruce, Neil D. B.
    2018 15TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2018, : 55 - 62
  • [8] Deep learning interpretability analysis methods in image interpretation
    Gong J.
    Huan L.
    Zheng X.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2022, 51 (06): : 873 - 884
  • [9] An Evaluation of Backpropagation Interpretability for Graph Classification with Deep Learning
    Shun, Kenneth Teo Tian
    Limanta, Eko Edita
    Khan, Arijit
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 561 - 570
  • [10] A Survey on Adversarial Deep Learning Robustness in Medical Image Analysis
    Apostolidis, Kyriakos D.
    Papakostas, George A.
    ELECTRONICS, 2021, 10 (17)