Classification of human- and AI-generated texts for different languages and domains

被引:0
|
作者
Kristina Schaaff [1 ]
Tim Schlippe [1 ]
Lorenz Mindner [1 ]
机构
[1] IU International University of Applied Sciences,
关键词
Generative AI; ChatGPT; Natural language processing; Features; Prompting; Artificial intelligence; Text classification;
D O I
10.1007/s10772-024-10143-3
中图分类号
学科分类号
摘要
Chatbots based on large language models (LLMs) like ChatGPT are available to the wide public. These tools can for instance be used by students to generate essays or whole theses from scratch or by rephrasing an existing text. But how does for instance a teacher know whether a text is written by a student or an AI? In this paper, we investigate perplexity, semantic, list lookup, document, error-based, readability, AI feedback and text vector features to classify human-generated and AI-generated texts from the educational domain as well as news articles. We analyze two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. Since we assumed that classification is more difficult when the AI has been prompted to create or rephrase the text in a way that a human would not recognize that it was generated or rephrased by an AI, we also investigate this advanced prompting scenario. To train, fine-tune and test the classifiers, we created the Multilingual Human-AI-Generated Text Corpus which contains human-generated, AI-generated and AI-rephrased texts from the educational domain in English, French, German, and Spanish and English texts from the news domain. We demonstrate that the same features can be used for the detection of AI-generated and AI-rephrased texts from the educational domain in all languages and the detection of AI-generated and AI-rephrased news texts. Our best systems significantly outperform GPTZero and ZeroGPT—state-of-the-art systems for the detection of AI-generated text. Our best text rephrasing detection system even outperforms GPTZero by 181.3% relative in F1-score.
引用
收藏
页码:935 / 956
页数:21
相关论文
共 50 条
  • [1] Ensuring Fairness of Human- and AI-Generated Test Items
    Belzak, William C. M.
    Naismith, Ben
    Burstein, Jill
    ARTIFICIAL INTELLIGENCE IN EDUCATION. POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2023, 2023, 1831 : 701 - 707
  • [2] Detecting AI-Generated Texts in Cross-Domains
    Zhou, You
    Wang, Jie
    PROCEEDINGS OF THE 2024 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2024, 2024,
  • [3] Human cognition and AI-generated texts: ethics in educational settings
    Doenyas, Ceymi
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
  • [4] Me vs. the machine? Subjective evaluations of human- and AI-generated advice
    Osborne, Merrick R.
    Bailey, Erica R.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [5] The Role of AI in Peer Support for Young People: A Study of Preferences for Human- and AI-Generated Responses
    Young, Jordyn
    Jawara, Laala M.
    Nguyen, Diep N.
    Daly, Brian
    Huh-Yoo, Jina
    Razi, Afsaneh
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [6] Towards Detection of AI-Generated Texts and Misinformation
    Najee-Ullah, Ahmad
    Landeros, Luis
    Balytskyi, Yaroslav
    Chang, Sang-Yoon
    SOCIO-TECHNICAL ASPECTS IN SECURITY, STAST 2021, 2022, 13176 : 194 - 205
  • [7] AI-generated vs human-authored texts: A multidimensional comparison
    Sardinha, Tony Berber
    APPLIED CORPUS LINGUISTICS, 2024, 4 (01):
  • [8] Exploring challenges in audiovisual translation: A comparative analysis of human- and AI-generated Arabic subtitles in Birdman
    Al Sawi, Islam
    Allam, Rania
    PLOS ONE, 2024, 19 (10):
  • [9] The imitation game: Detecting human and AI-generated texts in the era of ChatGPT and BARD
    Hayawi, Kadhim
    Shahriar, Sakib
    Mathew, Sujith Samuel
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [10] An Applied Statistics dataset for human vs AI-generated answer classification
    Salim, Md. Shahidul
    Hossain, Sk Imran
    DATA IN BRIEF, 2024, 54