Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
|
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] University of Twente,Faculty of Electrical Engineering, Mathematics and Computer Science
[2] University of Twente,Department of High
关键词
Large language models; Neural text detection; Adversarial attacks; Generative AI;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [21] Black-box Adversarial Attacks with Limited Queries and Information
    Ilyas, Andrew
    Engstrom, Logan
    Athalye, Anish
    Lin, Jessy
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [22] Black-box adversarial attacks by manipulating image attributes
    Wei, Xingxing
    Guo, Ying
    Li, Bo
    Information Sciences, 2021, 550 : 285 - 296
  • [23] MalDBA: Detection for Query-Based Malware Black-Box Adversarial Attacks
    Kong, Zixiao
    Xue, Jingfeng
    Liu, Zhenyan
    Wang, Yong
    Han, Weijie
    ELECTRONICS, 2023, 12 (07)
  • [24] Adversarial Black-Box Attacks Against Network Intrusion Detection Systems: A Survey
    Alatwi, Huda Ali
    Aldweesh, Amjad
    2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 34 - 40
  • [25] Natural Weather-Style Black-Box Adversarial Attacks Against Optical Aerial Detectors
    Tang, Guijian
    Yao, Wen
    Jiang, Tingsong
    Zhou, Weien
    Yang, Yang
    Wang, Donghua
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [26] Natural Weather-Style Black-Box Adversarial Attacks Against Optical Aerial Detectors
    Tang, Guijian
    Yao, Wen
    Jiang, Tingsong
    Zhou, Weien
    Yang, Yang
    Wang, Donghua
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [27] Black-box Universal Adversarial Attack on Text Classifiers
    Zhang, Yu
    Shao, Kun
    Yang, Junan
    Liu, Hui
    2021 2ND ASIA CONFERENCE ON COMPUTERS AND COMMUNICATIONS (ACCC 2021), 2021, : 1 - 5
  • [28] Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models
    Liu, Yu-An
    Zhang, Ruqing
    Guo, Jiafeng
    de Rijke, Maarten
    Chen, Wei
    Fan, Yixing
    Cheng, Xueqi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1700 - 1709
  • [29] Multi-granular Adversarial Attacks against Black-box Neural Ranking Models
    Liu, Yu-An
    Zhang, Ruqing
    Guo, Jiafeng
    de Rijke, Maarten
    Fan, Yixing
    Cheng, Xueqi
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1391 - 1400
  • [30] PAT: Geometry-Aware Hard-Label Black-Box Adversarial Attacks on Text
    Ye, Muchao
    Chen, Jinghui
    Miao, Chenglin
    Liu, Han
    Wang, Ting
    Ma, Fenglong
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3093 - 3104