Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引：0

作者：

Vitalii Fishchuk ^{[1
]}

Daniel Braun ^{[2
]}

机构：

[1] University of Twente,Faculty of Electrical Engineering, Mathematics and Computer Science

[2] University of Twente,Department of High

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 4期

关键词：

Large language models; Neural text detection; Adversarial attacks; Generative AI;

D O I：

10.1007/s10772-024-10144-2

中图分类号：

学科分类号：

摘要：

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.

引用

页码：861 / 874

页数：13

共 50 条

[1] Simple Black-Box Adversarial Attacks on Deep Neural Networks
Narodytska, Nina
Kasiviswanathan, Shiva
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1310 - 1318
[2] Simple Black-box Adversarial Attacks
Guo, Chuan
Gardner, Jacob R.
You, Yurong
Wilson, Andrew Gordon
Weinberger, Kilian Q.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[3] Black-box adversarial attacks on XSS attack detection model
Wang, Qiuhua
Yang, Hui
Wu, Guohua
Choo, Kim-Kwang Raymond
Zhang, Zheng
Miao, Gongxun
Ren, Yizhi
COMPUTERS & SECURITY, 2022, 113
[4] Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data
Shi, Yi
Sagduyu, Yalin E.
Davaslioglu, Kemal
Li, Jason H.
2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 453 - 458
[5] Xai-driven black-box adversarial attacks on network intrusion detectors
Okada, Satoshi
Jmila, Houda
Akashi, Kunio
Mitsunaga, Takuho
Sekiya, Yuji
Takase, Hideki
Blanc, Gregory
Nakamura, Hiroshi
INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2025, 24 (03)
[6] Resiliency of SNN on Black-Box Adversarial Attacks
Paudel, Bijay Raj
Itani, Aashish
Tragoudas, Spyros
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
[7] Black-box attacks against log anomaly detection with adversarial examples
Lu, Siyang
Wang, Mingquan
Wang, Dongdong
Wei, Xiang
Xiao, Sizhe
Wang, Zhiwei
Han, Ningning
Wang, Liqiang
INFORMATION SCIENCES, 2023, 619 : 249 - 262
[8] Black-Box Adversarial Attacks Against SQL Injection Detection Model
Alqhtani, Maha
Alghazzawi, Daniyal
Alarifi, Suaad
CONTEMPORARY MATHEMATICS, 2024, 5 (04): : 5098 - 5112
[9] Black-box Attacks Against Neural Binary Function Detection
Bundt, Joshua
Davinroy, Michael
Agadakos, Ioannis
Oprea, Alina
Robertson, William
PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023, 2023, : 1 - 16
[10] PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models
Wu, Chen
Zhang, Ruqing
Guo, Jiafeng
De Rijke, Maarten
Fan, Yixing
Cheng, Xueqi
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (04)

← 1 2 3 4 5 →