A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking

被引：6

作者：

Liu, Chang ^{[2
]}

Dong, Yinpeng ^{[1
,5
]}

Xiang, Wenzhao ^{[3
,7
]}

Yang, Xiao ^{[1
]}

Su, Hang ^{[1
,6
]}

Zhu, Jun ^{[1
,5
]}

Chen, Yuefeng ^{[4
]}

He, Yuan ^{[4
]}

Xue, Hui ^{[4
]}

Zheng, Shibao ^{[2
]}

机构：

[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol ICT, Beijing 100190, Peoples R China

[2] Shanghai Jiao Tong Univ, Inst Image Commun & Networks Engn, Dept Elect Engn EE, Shanghai 200240, Peoples R China

[3] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

[4] Alibaba Grp, Hangzhou 310023, Zhejiang, Peoples R China

[5] RealAI, Beijing 100085, Peoples R China

[6] Zhongguancun Lab, Beijing 100080, Peoples R China

[7] Peng Cheng Lab, Shenzhen 518000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年 / 133卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Robustness benchmark; Distribution shift; Pre-training; Adversarial training; Image classification;

D O I：

10.1007/s11263-024-02196-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.

引用

页码：567 / 589

页数：23

共 50 条

[21] Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions
Kamann, Christoph
Rother, Carsten
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (02) : 462 - 483
[22] Robustness of Image-Based Malware Classification Models trained with Generative Adversarial Networks
Reilly, Ciaran
O'Shaughnessy, Stephen
Thorpe, Christina
PROCEEDINGS OF THE 2023 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2023, 2023, : 92 - 99
[23] Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions
Christoph Kamann
Carsten Rother
International Journal of Computer Vision, 2021, 129 : 462 - 483
[24] Benchmarking and Boosting Transformers for Medical Image Classification
Ma, DongAo
Taher, Mohammad Reza Hosseinzadeh
Pang, Jiaxuan
Islam, Nahid Ui
Haghighi, Fatemeh
Gotway, Michael B.
Liang, Jianming
DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (DART 2022), 2022, 13542 : 12 - 22
[25] Comparative Study of Interpretable Image Classification Models
Bajcsi, Adel
Bajcsi, Anna
Pavel, Szabolcs
Portik, Abel
Sandor, Csanad
Szenkovits, Annamaria
Vas, Orsolya
Bodo, Zalan
Csato, Lehel
INFOCOMMUNICATIONS JOURNAL, 2023, 15 : 20 - 26
[26] Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study
Tamberg, Karl
Bahsi, Hayretdin
IEEE ACCESS, 2025, 13 : 29698 - 29717
[27] A Review of Adversarial Robustness Evaluation for Image Classification
Li, Zituo
Sun, Jianbin
Yang, Kewei
Xiong, Dehui
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (10): : 2164 - 2189
[28] Robustness Stress Testing in Medical Image Classification
Islam, Mobarakol
Li, Zeju
Glocker, Ben
UNCERTAINTY FOR SAFE UTILIZATION OF MACHINE LEARNING IN MEDICAL IMAGING, UNSURE 2023, 2023, 14291 : 167 - 176
[29] Robustness and Explainability of Image Classification Based on QCNN
Chen, Guoming
Long, Shun
Yuan, Zeduo
Li, Wanyi
Peng, Junfeng
Quantum Engineering, 2023, 2023
[30] SpectralFormer: Rethinking Hyperspectral Image Classification With Transformers
Hong, Danfeng
Han, Zhu
Yao, Jing
Gao, Lianru
Zhang, Bing
Plaza, Antonio
Chanussot, Jocelyn
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60

← 1 2 3 4 5 →