Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
|
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] University of Twente,Faculty of Electrical Engineering, Mathematics and Computer Science
[2] University of Twente,Department of High
关键词
Large language models; Neural text detection; Adversarial attacks; Generative AI;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [1] Simple Black-Box Adversarial Attacks on Deep Neural Networks
    Narodytska, Nina
    Kasiviswanathan, Shiva
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1310 - 1318
  • [2] Simple Black-box Adversarial Attacks
    Guo, Chuan
    Gardner, Jacob R.
    You, Yurong
    Wilson, Andrew Gordon
    Weinberger, Kilian Q.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Black-box adversarial attacks on XSS attack detection model
    Wang, Qiuhua
    Yang, Hui
    Wu, Guohua
    Choo, Kim-Kwang Raymond
    Zhang, Zheng
    Miao, Gongxun
    Ren, Yizhi
    COMPUTERS & SECURITY, 2022, 113
  • [4] Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data
    Shi, Yi
    Sagduyu, Yalin E.
    Davaslioglu, Kemal
    Li, Jason H.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2018, : 453 - 458
  • [5] Xai-driven black-box adversarial attacks on network intrusion detectors
    Okada, Satoshi
    Jmila, Houda
    Akashi, Kunio
    Mitsunaga, Takuho
    Sekiya, Yuji
    Takase, Hideki
    Blanc, Gregory
    Nakamura, Hiroshi
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2025, 24 (03)
  • [6] Resiliency of SNN on Black-Box Adversarial Attacks
    Paudel, Bijay Raj
    Itani, Aashish
    Tragoudas, Spyros
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
  • [7] Black-box attacks against log anomaly detection with adversarial examples
    Lu, Siyang
    Wang, Mingquan
    Wang, Dongdong
    Wei, Xiang
    Xiao, Sizhe
    Wang, Zhiwei
    Han, Ningning
    Wang, Liqiang
    INFORMATION SCIENCES, 2023, 619 : 249 - 262
  • [8] Black-Box Adversarial Attacks Against SQL Injection Detection Model
    Alqhtani, Maha
    Alghazzawi, Daniyal
    Alarifi, Suaad
    CONTEMPORARY MATHEMATICS, 2024, 5 (04): : 5098 - 5112
  • [9] Black-box Attacks Against Neural Binary Function Detection
    Bundt, Joshua
    Davinroy, Michael
    Agadakos, Ioannis
    Oprea, Alina
    Robertson, William
    PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023, 2023, : 1 - 16
  • [10] PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models
    Wu, Chen
    Zhang, Ruqing
    Guo, Jiafeng
    De Rijke, Maarten
    Fan, Yixing
    Cheng, Xueqi
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (04)