SoK: Pitfalls in Evaluating Black-Box Attacks

被引：0

作者：

Suya, Fnu ^{[1
]}

Suri, Anshuman ^{[2
]}

Zhang, Tingwei ^{[3
]}

Hong, Jingtao ^{[4
]}

Tian, Yuan ^{[5
]}

Evans, David ^{[2
]}

机构：

[1] Univ Maryland Coll Pk, College Pk, MD 20742 USA

[2] Univ Virginia, Charlottesville, VA USA

[3] Cornell Univ, Ithaca, NY USA

[4] Columbia Univ, New York, NY USA

[5] Univ Calif Los Angeles, Los Angeles, CA USA

来源：

IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

ADVERSARIAL EXAMPLES; ROBUSTNESS;

D O I：

10.1109/SaTML59370.2024.00026

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Numerous works study black-box attacks on image classifiers, where adversaries generate adversarial examples against unknown target models without having access to their internal information. However, these works make different assumptions about the adversary's knowledge, and current literature lacks cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identifying the threat models for different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates. We also highlight the need to evaluate attacks in diverse and harder settings and underscore the need for better selection criteria when picking the best candidate adversarial examples.

引用

页码：387 / 407

页数：21

共 50 条

[1] Simple Black-box Adversarial Attacks
Guo, Chuan
Gardner, Jacob R.
You, Yurong
Wilson, Andrew Gordon
Weinberger, Kilian Q.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[2] Black-Box Data Poisoning Attacks on Crowdsourcing
Chen, Pengpeng
Yang, Yongqiang
Yang, Dingqi
Sun, Hailong
Chen, Zhijun
Lin, Peng
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2975 - 2983
[3] Toward Visual Distortion in Black-Box Attacks
Li, Nannan
Chen, Zhenzhong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6156 - 6167
[4] Resiliency of SNN on Black-Box Adversarial Attacks
Paudel, Bijay Raj
Itani, Aashish
Tragoudas, Spyros
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
[5] Beating White-Box Defenses with Black-Box Attacks
Kumova, Vera
Pilat, Martin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[6] Black-box Adversarial Attacks on Video Recognition Models
Jiang, Linxi
Ma, Xingjun
Chen, Shaoxiang
Bailey, James
Jiang, Yu-Gang
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 864 - 872
[7] Black-box Adversarial Attacks in Autonomous Vehicle Technology
Kumar, K. Naveen
Vishnu, C.
Mitra, Reshmi
Mohan, C. Krishna
2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
[8] AdvMind: Inferring Adversary Intent of Black-Box Attacks
Pang, Ren
Zhang, Xinyang
Ji, Shouling
Luo, Xiapu
Wang, Ting
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1899 - 1907
[9] GeoDA: a geometric framework for black-box adversarial attacks
Rahmati, Ali
Moosavi-Dezfooli, Seyed-Mohsen
Frossard, Pascal
Dai, Huaiyu
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 8443 - 8452
[10] Black-box adversarial attacks by manipulating image attributes
Wei, Xingxing
Guo, Ying
Li, Bo
INFORMATION SCIENCES, 2021, 550 : 285 - 296

← 1 2 3 4 5 →