Classification of human- and AI-generated texts for different languages and domains

被引:0
|
作者
Kristina Schaaff [1 ]
Tim Schlippe [1 ]
Lorenz Mindner [1 ]
机构
[1] IU International University of Applied Sciences,
关键词
Generative AI; ChatGPT; Natural language processing; Features; Prompting; Artificial intelligence; Text classification;
D O I
10.1007/s10772-024-10143-3
中图分类号
学科分类号
摘要
Chatbots based on large language models (LLMs) like ChatGPT are available to the wide public. These tools can for instance be used by students to generate essays or whole theses from scratch or by rephrasing an existing text. But how does for instance a teacher know whether a text is written by a student or an AI? In this paper, we investigate perplexity, semantic, list lookup, document, error-based, readability, AI feedback and text vector features to classify human-generated and AI-generated texts from the educational domain as well as news articles. We analyze two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. Since we assumed that classification is more difficult when the AI has been prompted to create or rephrase the text in a way that a human would not recognize that it was generated or rephrased by an AI, we also investigate this advanced prompting scenario. To train, fine-tune and test the classifiers, we created the Multilingual Human-AI-Generated Text Corpus which contains human-generated, AI-generated and AI-rephrased texts from the educational domain in English, French, German, and Spanish and English texts from the news domain. We demonstrate that the same features can be used for the detection of AI-generated and AI-rephrased texts from the educational domain in all languages and the detection of AI-generated and AI-rephrased news texts. Our best systems significantly outperform GPTZero and ZeroGPT—state-of-the-art systems for the detection of AI-generated text. Our best text rephrasing detection system even outperforms GPTZero by 181.3% relative in F1-score.
引用
收藏
页码:935 / 956
页数:21
相关论文
共 50 条
  • [41] Human vs. Machine: A Comparative Study on the Detection of AI-Generated Content
    Tadjine, Amal bou
    Harrag, Fouzi
    Shaalan, Khaled
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2025, 24 (02)
  • [42] Comments on "How Sensitive Are the Free AI-detector Tools in Detecting AI-generated Texts? A Comparison of Popular AI-detector Tools"
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    INDIAN JOURNAL OF PSYCHOLOGICAL MEDICINE, 2025,
  • [43] RFBES at SemEval-2024 Task 8: Investigating Syntactic and Semantic Features for Distinguishing AI-Generated and Human-Written Texts
    Rad, Mohammad Heydari
    Farsi, Farhan
    Bali, Shayan
    Etezadi, Romina
    Shamsfard, Mehrnoush
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 450 - 454
  • [44] Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions
    Zhou, Jiawei
    Zhang, Yixuan
    Luo, Qianni
    Parker, Andrea G.
    De Choudhury, Munmun
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [45] Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
    Lu, Zeyu
    Huang, Di
    Bai, Lei
    Qu, Jingjing
    Wu, Chengyue
    Liu, Xihui
    Ouyang, Wanli
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Human-Created and AI-Generated Text: What's Left to Uncover?
    Salter, Steven
    Teh, Phoey Lee
    Hebblewhite, Richard
    INTELLIGENT COMPUTING, VOL 2, 2024, 2024, 1017 : 74 - 80
  • [47] Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles
    Venkit, Pranav Narayanan
    Gautam, Sanjana
    Panchanadikar, Ruchi
    Huang, Ting-Hao 'Kenneth'
    Wilson, Shomir
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 554 - 565
  • [48] AI-generated recommendations: Roles of language style, perceived AI human-likeness, and recommendation agent
    Baek, Tae Hyun
    Kim, Hyoje Jay
    Kim, Jungkeun
    INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT, 2025, 126
  • [49] ASO Visual Abstract: Performance of AI Content Detectors Using Human and AI-Generated Scientific Writing
    Flitcroft, Madelyn A.
    Sheriff, Salma A.
    Wolfrath, Nathan
    Maddula, Ragasnehith
    Mcconnell, Laura
    Xing, Yun
    Haines, Krista L.
    Wong, Sandra L.
    Kothari, Anai N.
    ANNALS OF SURGICAL ONCOLOGY, 2024, 31 (10) : 6410 - 6411
  • [50] Development of an Automated, Rule-Based Measurement Method for Easy Language and Its Application to AI-Generated Texts
    Siegert, Ingo
    Al-Hamad, Ahmad
    Pongratz, Katharina Maria
    Busch, Matthias
    HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 234 - 244