A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking

被引:6
|
作者
Liu, Chang [2 ]
Dong, Yinpeng [1 ,5 ]
Xiang, Wenzhao [3 ,7 ]
Yang, Xiao [1 ]
Su, Hang [1 ,6 ]
Zhu, Jun [1 ,5 ]
Chen, Yuefeng [4 ]
He, Yuan [4 ]
Xue, Hui [4 ]
Zheng, Shibao [2 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol ICT, Beijing 100190, Peoples R China
[2] Shanghai Jiao Tong Univ, Inst Image Commun & Networks Engn, Dept Elect Engn EE, Shanghai 200240, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[4] Alibaba Grp, Hangzhou 310023, Zhejiang, Peoples R China
[5] RealAI, Beijing 100085, Peoples R China
[6] Zhongguancun Lab, Beijing 100080, Peoples R China
[7] Peng Cheng Lab, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Robustness benchmark; Distribution shift; Pre-training; Adversarial training; Image classification;
D O I
10.1007/s11263-024-02196-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.
引用
收藏
页码:567 / 589
页数:23
相关论文
共 50 条
  • [31] A comprehensive benchmarking system for evaluating global vegetation models
    Kelley, D. I.
    Prentice, I. C.
    Harrison, S. P.
    Wang, H.
    Simard, M.
    Fisher, J. B.
    Willis, K. O.
    BIOGEOSCIENCES, 2013, 10 (05) : 3313 - 3340
  • [32] A Comprehensive Study on Benchmarking Permissioned Blockchains
    Chacko, Jeeta Ann
    Mayer, Ruben
    Fekete, Alan
    Gramoli, Vincent
    Jacobsen, Hans-Arno
    PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2023, 2024, 14247 : 18 - 33
  • [33] Benchmarking the Robustness of Cross-View Geo-Localization Models
    Zhang, Qingwang
    Zhu, Yingying
    COMPUTER VISION - ECCV 2024, PT LXXXVII, 2025, 15145 : 36 - 53
  • [34] Vine variety identification through leaf image classification: a large-scale study on the robustness of five deep learning models
    De Nart, D.
    Gardiman, M.
    Alba, V.
    Tarricone, L.
    Storchi, P.
    Roccotelli, S.
    Ammoniaci, M.
    Tosi, V.
    Perria, R.
    Carraro, R.
    JOURNAL OF AGRICULTURAL SCIENCE, 2024, 162 (01): : 19 - 32
  • [35] Benchmarking robustness of load forecasting models under data integrity attacks
    Luo, Jian
    Hong, Tao
    Fang, Shu-Cherng
    INTERNATIONAL JOURNAL OF FORECASTING, 2018, 34 (01) : 89 - 104
  • [36] Benchmarking machine learning robustness in Covid-19 genome sequence classification
    Ali, Sarwan
    Sahoo, Bikram
    Zelikovsky, Alexander
    Chen, Pin-Yu
    Patterson, Murray
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [37] Benchmarking machine learning robustness in Covid-19 genome sequence classification
    Sarwan Ali
    Bikram Sahoo
    Alexander Zelikovsky
    Pin-Yu Chen
    Murray Patterson
    Scientific Reports, 13
  • [38] Benchmarking machine learning models for quantum state classification
    Pedicillo, Edoardo
    Pasquale, Andrea
    Carrazza, Stefano
    26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS, CHEP 2023, 2024, 295
  • [39] Benchmarking Deep Learning Models for Classification of Book Covers
    Lucieri A.
    Sabir H.
    Siddiqui S.A.
    Rizvi S.T.R.
    Iwana B.K.
    Uchida S.
    Dengel A.
    Ahmed S.
    SN Computer Science, 2020, 1 (3)
  • [40] A comprehensive system for image scene classification
    Ali Ghanbari Sorkhi
    Hamid Hassanpour
    Mansoor Fateh
    Multimedia Tools and Applications, 2020, 79 : 18033 - 18058