A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking

被引:6
|
作者
Liu, Chang [2 ]
Dong, Yinpeng [1 ,5 ]
Xiang, Wenzhao [3 ,7 ]
Yang, Xiao [1 ]
Su, Hang [1 ,6 ]
Zhu, Jun [1 ,5 ]
Chen, Yuefeng [4 ]
He, Yuan [4 ]
Xue, Hui [4 ]
Zheng, Shibao [2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol ICT, Beijing 100190, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Commun & Networks Engn, Dept Elect Engn EE, Shanghai 200240, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[4] Alibaba Grp, Hangzhou 310023, Zhejiang, Peoples R China
[5] RealAI, Beijing 100085, Peoples R China
[6] Zhongguancun Lab, Beijing 100080, Peoples R China
[7] Peng Cheng Lab, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Robustness benchmark; Distribution shift; Pre-training; Adversarial training; Image classification;
D O I
10.1007/s11263-024-02196-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.
引用
收藏
页码:567 / 589
页数:23
相关论文
共 50 条
  • [21] Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions
    Kamann, Christoph
    Rother, Carsten
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (02) : 462 - 483
  • [22] Robustness of Image-Based Malware Classification Models trained with Generative Adversarial Networks
    Reilly, Ciaran
    O'Shaughnessy, Stephen
    Thorpe, Christina
    PROCEEDINGS OF THE 2023 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2023, 2023, : 92 - 99
  • [23] Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions
    Christoph Kamann
    Carsten Rother
    International Journal of Computer Vision, 2021, 129 : 462 - 483
  • [24] Benchmarking and Boosting Transformers for Medical Image Classification
    Ma, DongAo
    Taher, Mohammad Reza Hosseinzadeh
    Pang, Jiaxuan
    Islam, Nahid Ui
    Haghighi, Fatemeh
    Gotway, Michael B.
    Liang, Jianming
    DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (DART 2022), 2022, 13542 : 12 - 22
  • [25] Comparative Study of Interpretable Image Classification Models
    Bajcsi, Adel
    Bajcsi, Anna
    Pavel, Szabolcs
    Portik, Abel
    Sandor, Csanad
    Szenkovits, Annamaria
    Vas, Orsolya
    Bodo, Zalan
    Csato, Lehel
    INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 20 - 26
  • [26] Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study
    Tamberg, Karl
    Bahsi, Hayretdin
    IEEE ACCESS, 2025, 13 : 29698 - 29717
  • [27] A Review of Adversarial Robustness Evaluation for Image Classification
    Li, Zituo
    Sun, Jianbin
    Yang, Kewei
    Xiong, Dehui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (10): : 2164 - 2189
  • [28] Robustness Stress Testing in Medical Image Classification
    Islam, Mobarakol
    Li, Zeju
    Glocker, Ben
    UNCERTAINTY FOR SAFE UTILIZATION OF MACHINE LEARNING IN MEDICAL IMAGING, UNSURE 2023, 2023, 14291 : 167 - 176
  • [29] Robustness and Explainability of Image Classification Based on QCNN
    Chen, Guoming
    Long, Shun
    Yuan, Zeduo
    Li, Wanyi
    Peng, Junfeng
    Quantum Engineering, 2023, 2023
  • [30] SpectralFormer: Rethinking Hyperspectral Image Classification With Transformers
    Hong, Danfeng
    Han, Zhu
    Yao, Jing
    Gao, Lianru
    Zhang, Bing
    Plaza, Antonio
    Chanussot, Jocelyn
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60