Classification of human- and AI-generated texts for different languages and domains

被引:0
|
作者
Kristina Schaaff [1 ]
Tim Schlippe [1 ]
Lorenz Mindner [1 ]
机构
[1] IU International University of Applied Sciences,
关键词
Generative AI; ChatGPT; Natural language processing; Features; Prompting; Artificial intelligence; Text classification;
D O I
10.1007/s10772-024-10143-3
中图分类号
学科分类号
摘要
Chatbots based on large language models (LLMs) like ChatGPT are available to the wide public. These tools can for instance be used by students to generate essays or whole theses from scratch or by rephrasing an existing text. But how does for instance a teacher know whether a text is written by a student or an AI? In this paper, we investigate perplexity, semantic, list lookup, document, error-based, readability, AI feedback and text vector features to classify human-generated and AI-generated texts from the educational domain as well as news articles. We analyze two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. Since we assumed that classification is more difficult when the AI has been prompted to create or rephrase the text in a way that a human would not recognize that it was generated or rephrased by an AI, we also investigate this advanced prompting scenario. To train, fine-tune and test the classifiers, we created the Multilingual Human-AI-Generated Text Corpus which contains human-generated, AI-generated and AI-rephrased texts from the educational domain in English, French, German, and Spanish and English texts from the news domain. We demonstrate that the same features can be used for the detection of AI-generated and AI-rephrased texts from the educational domain in all languages and the detection of AI-generated and AI-rephrased news texts. Our best systems significantly outperform GPTZero and ZeroGPT—state-of-the-art systems for the detection of AI-generated text. Our best text rephrasing detection system even outperforms GPTZero by 181.3% relative in F1-score.
引用
收藏
页码:935 / 956
页数:21
相关论文
共 50 条
  • [21] Towards AI-Generated Essay Classification Using Numerical Text Representation
    Krawczyk, Natalia
    Probierz, Barbara
    Kozak, Jan
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [22] AI or Human: The Socio-ethical Implications of AI-Generated Media Content
    Partadiredja, Reza Arkan
    Serrano, Carlos Entrena
    Ljubenkov, Davor
    2020 13TH CMI CONFERENCE ON CYBERSECURITY AND PRIVACY (CMI) - DIGITAL TRANSFORMATION - POTENTIALS AND CHALLENGES(51275), 2020, : 45 - 50
  • [23] CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images
    Bird, Jordan J.
    Lotfi, Ahmad
    IEEE ACCESS, 2024, 12 : 15642 - 15650
  • [24] Urgent Need for Ethical Policies to Prevent the Proliferation of AI-Generated Texts in Scientific Papers
    Da-Wen Sun
    Food and Bioprocess Technology, 2023, 16 : 941 - 943
  • [25] Implementing a proposed framework for enhancing critical thinking skills in synthesizing AI-generated texts
    Yusuf, Abdullahi
    Bello, Shamsudeen
    Pervin, Nasrin
    Tukur, Abdullahi Kadage
    THINKING SKILLS AND CREATIVITY, 2024, 53
  • [26] Urgent Need for Ethical Policies to Prevent the Proliferation of AI-Generated Texts in Scientific Papers
    Sun, Da-Wen
    FOOD AND BIOPROCESS TECHNOLOGY, 2023, 16 (05) : 941 - 943
  • [27] A Human-factors Approach for Evaluating AI-generated Images
    Combs, Kara
    Bihl, Trevor J.
    Gadre, Arya
    Christopherson, Isaiah
    PROCEEDINGS OF THE 2024 COMPUTERS AND PEOPLE RESEARCH CONFERENCE, SIGMIS-CPR 2024, 2024,
  • [28] Can AI be a Poet? Comparative Analysis of Human-authored and AI-generated Poetry
    Veremchuk, Eldar
    ACTA NEOPHILOLOGICA, 2024, 57 (02) : 113 - 125
  • [29] Detecting and Unmasking AI-Generated Texts through Explainable Artificial Intelligence using Stylistic Features
    Shah, Aditya
    Ranka, Prateek
    Dedhia, Urmi
    Prasad, Shruti
    Muni, Siddhi
    Bhowmick, Kiran
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 1043 - 1053
  • [30] The limits to growth(ism) in ChatGPT-corpus assisted discourse studies in AI-generated texts
    Szczepanik, Radoslaw Jan
    DISCOURSE & SOCIETY, 2025,