SoK: Pitfalls in Evaluating Black-Box Attacks

被引:0
|
作者
Suya, Fnu [1 ]
Suri, Anshuman [2 ]
Zhang, Tingwei [3 ]
Hong, Jingtao [4 ]
Tian, Yuan [5 ]
Evans, David [2 ]
机构
[1] Univ Maryland Coll Pk, College Pk, MD 20742 USA
[2] Univ Virginia, Charlottesville, VA USA
[3] Cornell Univ, Ithaca, NY USA
[4] Columbia Univ, New York, NY USA
[5] Univ Calif Los Angeles, Los Angeles, CA USA
基金
美国国家科学基金会;
关键词
ADVERSARIAL EXAMPLES; ROBUSTNESS;
D O I
10.1109/SaTML59370.2024.00026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous works study black-box attacks on image classifiers, where adversaries generate adversarial examples against unknown target models without having access to their internal information. However, these works make different assumptions about the adversary's knowledge, and current literature lacks cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identifying the threat models for different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates. We also highlight the need to evaluate attacks in diverse and harder settings and underscore the need for better selection criteria when picking the best candidate adversarial examples.
引用
收藏
页码:387 / 407
页数:21
相关论文
共 50 条
  • [1] Simple Black-box Adversarial Attacks
    Guo, Chuan
    Gardner, Jacob R.
    You, Yurong
    Wilson, Andrew Gordon
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Black-Box Data Poisoning Attacks on Crowdsourcing
    Chen, Pengpeng
    Yang, Yongqiang
    Yang, Dingqi
    Sun, Hailong
    Chen, Zhijun
    Lin, Peng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983
  • [3] Toward Visual Distortion in Black-Box Attacks
    Li, Nannan
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6156 - 6167
  • [4] Resiliency of SNN on Black-Box Adversarial Attacks
    Paudel, Bijay Raj
    Itani, Aashish
    Tragoudas, Spyros
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
  • [5] Beating White-Box Defenses with Black-Box Attacks
    Kumova, Vera
    Pilat, Martin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Black-box Adversarial Attacks on Video Recognition Models
    Jiang, Linxi
    Ma, Xingjun
    Chen, Shaoxiang
    Bailey, James
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 864 - 872
  • [7] Black-box Adversarial Attacks in Autonomous Vehicle Technology
    Kumar, K. Naveen
    Vishnu, C.
    Mitra, Reshmi
    Mohan, C. Krishna
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [8] AdvMind: Inferring Adversary Intent of Black-Box Attacks
    Pang, Ren
    Zhang, Xinyang
    Ji, Shouling
    Luo, Xiapu
    Wang, Ting
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1899 - 1907
  • [9] GeoDA: a geometric framework for black-box adversarial attacks
    Rahmati, Ali
    Moosavi-Dezfooli, Seyed-Mohsen
    Frossard, Pascal
    Dai, Huaiyu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 8443 - 8452
  • [10] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    INFORMATION SCIENCES, 2021, 550 : 285 - 296