Classification of human- and AI-generated texts for different languages and domains

被引:0
|
作者
Kristina Schaaff [1 ]
Tim Schlippe [1 ]
Lorenz Mindner [1 ]
机构
[1] IU International University of Applied Sciences,
关键词
Generative AI; ChatGPT; Natural language processing; Features; Prompting; Artificial intelligence; Text classification;
D O I
10.1007/s10772-024-10143-3
中图分类号
学科分类号
摘要
Chatbots based on large language models (LLMs) like ChatGPT are available to the wide public. These tools can for instance be used by students to generate essays or whole theses from scratch or by rephrasing an existing text. But how does for instance a teacher know whether a text is written by a student or an AI? In this paper, we investigate perplexity, semantic, list lookup, document, error-based, readability, AI feedback and text vector features to classify human-generated and AI-generated texts from the educational domain as well as news articles. We analyze two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. Since we assumed that classification is more difficult when the AI has been prompted to create or rephrase the text in a way that a human would not recognize that it was generated or rephrased by an AI, we also investigate this advanced prompting scenario. To train, fine-tune and test the classifiers, we created the Multilingual Human-AI-Generated Text Corpus which contains human-generated, AI-generated and AI-rephrased texts from the educational domain in English, French, German, and Spanish and English texts from the news domain. We demonstrate that the same features can be used for the detection of AI-generated and AI-rephrased texts from the educational domain in all languages and the detection of AI-generated and AI-rephrased news texts. Our best systems significantly outperform GPTZero and ZeroGPT—state-of-the-art systems for the detection of AI-generated text. Our best text rephrasing detection system even outperforms GPTZero by 181.3% relative in F1-score.
引用
收藏
页码:935 / 956
页数:21
相关论文
共 50 条
  • [31] Boosting Human Decision-making with AI-Generated Decision Aids
    Becker F.
    Skirzyński J.
    van Opheusden B.
    Lieder F.
    Computational Brain & Behavior, 2022, 5 (4) : 467 - 490
  • [32] Exploring AI-Generated English Relative Clauses in Comparison to Human Production
    Yun, Hongoak
    Yi, Eunkyung
    Song, Sanghoun
    JOURNAL OF COGNITIVE SCIENCE, 2023, 24 (04) : 465 - 496
  • [33] PrefIQA: Human Preference Learning for AI-generated Image Quality Assessment
    Gao, Hengjian
    Zhang, Kaiwei
    Sun, Wei
    Li, Chunyi
    Duan, Huiyu
    Liu, Xiaohong
    Min, Xiongkuo
    Zhai, Guangtao
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [34] Entropy and complexity analysis of AI-generated and human-made paintings
    Papia, E. -M.
    Kondi, A.
    Constantoudis, V.
    CHAOS SOLITONS & FRACTALS, 2023, 170
  • [35] How Sensitive Are the Free AI-detector Tools in Detecting AI-generated Texts? A Comparison of Popular AI-detector Tools
    Kar, Sujita Kumar
    Bansal, Teena
    Modi, Sumit
    Singh, Amit
    INDIAN JOURNAL OF PSYCHOLOGICAL MEDICINE, 2024,
  • [36] AI-Generated Face Image Identification with Different Color Space Channel Combinations
    Mo, Songwen
    Lu, Pei
    Liu, Xiaoyong
    SENSORS, 2022, 22 (21)
  • [37] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text
    Ahmed M. Elkhatat
    Khaled Elsaid
    Saeed Almeer
    International Journal for Educational Integrity, 19
  • [38] Comparing the Willingness to Share for Human-generated vs. AI-generated Fake News
    Bashardoust, Amirsiavosh
    Feuerriegel, Stefan
    Shrestha, Yash Raj
    Proceedings of the ACM on Human-Computer Interaction, 2024, 8 (CSCW2)
  • [39] Identification of Human-Generated vs AI-Generated Research Abstracts by Health Care Professionals
    Ren, Dennis
    Tagg, Andrew James
    Wilcox, Helena
    Roland, Damian
    JAMA PEDIATRICS, 2024, 178 (06) : 625 - 626
  • [40] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text
    Elkhatat, Ahmed M.
    Elsaid, Khaled
    Almeer, Saeed
    INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2023, 19 (01)