Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
|
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] University of Twente,Faculty of Electrical Engineering, Mathematics and Computer Science
[2] University of Twente,Department of High
关键词
Large language models; Neural text detection; Adversarial attacks; Generative AI;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [31] SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation
    Cheng, Wuxinlin
    Deng, Chenhui
    Zhao, Zhiqiang
    Cai, Yaohui
    Zhang, Zhiru
    Feng, Zhuo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [32] White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks
    Gil, Yotam
    Chai, Yoav
    Gorodissky, Or
    Berant, Jonathan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1373 - 1379
  • [33] Beware the Black-Box: On the Robustness of Recent Defenses to Adversarial Examples
    Mahmood, Kaleel
    Gurevin, Deniz
    van Dijk, Marten
    Nguyen, Phuoung Ha
    ENTROPY, 2021, 23 (10)
  • [34] DDSG-GAN: Generative Adversarial Network with Dual Discriminators and Single Generator for Black-Box Attacks
    Wang, Fangwei
    Ma, Zerou
    Zhang, Xiaohan
    Li, Qingru
    Wang, Changguang
    MATHEMATICS, 2023, 11 (04)
  • [35] Query-based Local Black-box Adversarial Attacks
    Shi, Jing
    Zhang, Xiaolin
    Xu, Enhui
    Wang, Yongping
    Zhang, Wenwen
    International Journal of Network Security, 2023, 25 (06) : 1048 - 1058
  • [36] Black-Box Adversarial Attacks against Audio Forensics Models
    Jiang, Yi
    Ye, Dengpan
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [37] AutoAttacker: A reinforcement learning approach for black-box adversarial attacks
    Tsingenopoulos, Ilias
    Preuveneers, Davy
    Joosen, Wouter
    2019 4TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (EUROS&PW), 2019, : 229 - 237
  • [38] Black-box transferable adversarial attacks based on ensemble advGAN
    Huang S.-N.
    Li Y.-X.
    Mao Y.-H.
    Ban A.-Y.
    Zhang Z.-Y.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (10): : 2391 - 2398
  • [39] Binary Black-Box Adversarial Attacks with Evolutionary Learning against IoT Malware Detection
    Wang, Fangwei
    Lu, Yuanyuan
    Wang, Changguang
    Li, Qingru
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [40] Heuristic Black-Box Adversarial Attacks on Video Recognition Models
    Wei, Zhipeng
    Chen, Jingjing
    Wei, Xingxing
    Jiang, Linxi
    Chua, Tat-Seng
    Zhou, Fengfeng
    Jiang, Yu-Gang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12338 - 12345