Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
|
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] University of Twente,Faculty of Electrical Engineering, Mathematics and Computer Science
[2] University of Twente,Department of High
关键词
Large language models; Neural text detection; Adversarial attacks; Generative AI;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [41] Sensitive region-aware black-box adversarial attacks
    Lin, Chenhao
    Han, Sicong
    Zhu, Jiongli
    Li, Qian
    Shen, Chao
    Zhang, Youwei
    Guan, Xiaohong
    INFORMATION SCIENCES, 2023, 637
  • [42] Adaptive Temporal Grouping for Black-box Adversarial Attacks on Videos
    Wei, Zhipeng
    Chen, Jingjing
    Zhang, Hao
    Jiang, Linxi
    Jiang, Yu-Gang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 587 - 593
  • [43] Black-box adversarial patch attacks using differential evolution against aerial imagery object detectors
    Tang, Guijian
    Yao, Wen
    Li, Chao
    Jiang, Tingsong
    Yang, Shaowu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [45] Simple Black-Box Universal Adversarial Attacks on Deep Neural Networks for Medical Image Classification
    Koga, Kazuki
    Takemoto, Kazuhiro
    ALGORITHMS, 2022, 15 (05)
  • [46] attackGAN: Adversarial Attack against Black-box IDS using Generative Adversarial Networks
    Zhao, Shuang
    Li, Jing
    Wang, Jianmin
    Zhang, Zhao
    Zhu, Lin
    Zhang, Yong
    2020 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI2020), 2021, 187 : 128 - 133
  • [47] NUAT-GAN: Generating Black-Box Natural Universal Adversarial Triggers for Text Classifiers Using Generative Adversarial Networks
    Gao, Haoran
    Zhang, Hua
    Wang, Jiahui
    Zhang, Xin
    Wang, Huawei
    Li, Wenmin
    Tu, Tengfei
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 6484 - 6498
  • [48] Semantics aware adversarial malware examples generation for black-box attacks
    Peng, Xiaowei
    Xian, Hequn
    Lu, Qian
    Lu, Xiuqing
    APPLIED SOFT COMPUTING, 2021, 109
  • [49] Single-Shot Black-Box Adversarial Attacks Against Malware Detectors: A Causal Language Model Approach
    Hu, James Lee
    Ebrahimi, Mohammadreza
    Chen, Hsinchun
    2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2021, : 7 - 12
  • [50] Black-Box Adversarial Attacks Against Deep Learning Based Malware Binaries Detection with GAN
    Yuan, Junkun
    Zhou, Shaofang
    Lin, Lanfen
    Wang, Feng
    Cui, Jia
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2536 - 2542