SeqXGPT: Sentence-Level AI-Generated Text Detection

被引:0
|
作者
Wang, Pengyu
Li, Linyang
Ren, Ke
Jiang, Botian
Zhang, Dong
Qiu, Xipeng [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Widely applied large language models (LLMs) can generate human-like content, raising concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text (AIGT) detectors. Current works only consider document-level AIGT detection, therefore, in this paper, we first introduce a sentence-level detection challenge by synthesizing a dataset that contains documents that are polished with LLMs, that is, the documents contain sentences written by humans and sentences modified by LLMs. Then we propose Sequence X (Check) GPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection. These features are composed like waves in speech processing and cannot be studied by LLMs. Therefore, we build SeqXGPT based on convolution and self-attention networks. We test it in both sentence and document-level detection challenges. Experimental results show that previous methods struggle in solving sentence-level AIGT detection, while our method not only significantly surpasses baseline methods in both sentence and document-level detection challenges but also exhibits strong generalization capabilities.(1)
引用
收藏
页码:1144 / 1156
页数:13
相关论文
共 50 条
  • [1] Testing of detection tools for AI-generated text
    Weber-Wulff, Debora
    Anohina-Naumeca, Alla
    Bjelobaba, Sonja
    Foltynek, Tomas
    Guerrero-Dib, Jean
    Popoola, Olumide
    Sigut, Petr
    Waddington, Lorna
    INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2023, 19 (01)
  • [2] Testing of detection tools for AI-generated text
    Debora Weber-Wulff
    Alla Anohina-Naumeca
    Sonja Bjelobaba
    Tomáš Foltýnek
    Jean Guerrero-Dib
    Olumide Popoola
    Petr Šigut
    Lorna Waddington
    International Journal for Educational Integrity, 19
  • [3] Automatically Generated Spam Detection Based on Sentence-level Topic Information
    Suhara, Yoshihiko
    Toda, Hiroyuki
    Nishioka, Shuichi
    Susaki, Seiji
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 1157 - 1160
  • [4] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text
    Ahmed M. Elkhatat
    Khaled Elsaid
    Saeed Almeer
    International Journal for Educational Integrity, 19
  • [5] Online Detection of AI-Generated Images
    Epstein, David C.
    Jain, Ishan
    Wang, Oliver
    Zhang, Richard
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 382 - 392
  • [6] Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges
    Alshammari, Hamed
    Elleithy, Khaled
    INFORMATION, 2024, 15 (07)
  • [7] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text
    Elkhatat, Ahmed M.
    Elsaid, Khaled
    Almeer, Saeed
    INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2023, 19 (01)
  • [8] Google unveils invisible 'watermark' for AI-generated text
    Gibney, Elizabeth
    NATURE, 2024, 634 (8036) : 1027 - 1028
  • [9] Sentence-Level Novelty Detection in English and Malay
    Kwee, Agus T.
    Tsai, Flora S.
    Tang, Wenyin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 40 - 51
  • [10] TEXT SUMMARIZATION USING SENTENCE-LEVEL SEMANTIC GRAPH MODEL
    Han, Xu
    Lv, Tao
    Jiang, Qiaowei
    Wang, Xinyan
    Wang, Cong
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 171 - 176