Towards AI-Generated Essay Classification Using Numerical Text Representation

被引:1
|
作者
Krawczyk, Natalia [1 ]
Probierz, Barbara [1 ,2 ]
Kozak, Jan [1 ]
机构
[1] Univ Econ Katowice, Dept Machine Learning, 1 Maja 50, PL-40287 Katowice, Poland
[2] Lukasiewicz Res Network, Inst Innovat Technol EMAG, Leopolda 31, PL-40189 Katowice, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
关键词
natural language processing; numerical text representations; text classification; large language models;
D O I
10.3390/app14219795
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The detection of essays written by AI compared to those authored by students is increasingly becoming a significant issue in educational settings. This research examines various numerical text representation techniques to improve the classification of these essays. Utilizing a diverse dataset, we undertook several preprocessing steps, including data cleaning, tokenization, and lemmatization. Our system analyzes different text representation methods such as Bag of Words, TF-IDF, and fastText embeddings in conjunction with multiple classifiers. Our experiments showed that TF-IDF weights paired with logistic regression reached the highest accuracy of 99.82%. Methods like Bag of Words, TF-IDF, and fastText embeddings achieved accuracies exceeding 96.50% across all tested classifiers. Sentence embeddings, including MiniLM and distilBERT, yielded accuracies from 93.78% to 96.63%, indicating room for further refinement. Conversely, pre-trained fastText embeddings showed reduced performance, with a lowest accuracy of 89.88% in logistic regression. Remarkably, the XGBoost classifier delivered the highest minimum accuracy of 96.24%. Specificity and precision were above 99% for most methods, showcasing high capability in differentiating between student-created and AI-generated texts. This study underscores the vital role of choosing dataset-specific text representations to boost classification accuracy.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] AI-generated text may have a role in evidence-based medicine
    Peng, Yifan
    Rousseau, Justin F.
    Shortliffe, Edward H.
    Weng, Chunhua
    NATURE MEDICINE, 2023, 29 (07) : 1593 - 1594
  • [32] From text to test: AI-generated control software for materials science instruments
    Febba, Davi
    Egbo, Kingsley
    Callahan, William A.
    Zakutayev, Andriy
    DIGITAL DISCOVERY, 2025, 4 (01): : 35 - 45
  • [33] Human-Created and AI-Generated Text: What's Left to Uncover?
    Salter, Steven
    Teh, Phoey Lee
    Hebblewhite, Richard
    INTELLIGENT COMPUTING, VOL 2, 2024, 2024, 1017 : 74 - 80
  • [34] How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator
    Hinton, Martin
    Wagemans, Jean H. M.
    ARGUMENT & COMPUTATION, 2023, 14 (01) : 59 - 74
  • [35] Using AI-Generated Podcasts as an Adjunct to Traditional Teaching Strategies
    Folgert, April
    DeGroot, Kerry
    NURSE EDUCATOR, 2025, 50 (02) : 78 - 78
  • [36] Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence
    Arshed, Muhammad Asad
    Gherghina, Stefan Cristian
    Dewi, Christine
    Iqbal, Asma
    Mumtaz, Shahzad
    COMPUTATION, 2024, 12 (05)
  • [37] All the News That's Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation
    Kreps, Sarah
    McCain, R. Miles
    Brundage, Miles
    JOURNAL OF EXPERIMENTAL POLITICAL SCIENCE, 2022, 9 (01) : 104 - 117
  • [38] Navigating the Landscape of AI-Generated Text Detection: Issues and Solutions for Upholding Academic Integrity
    Gupta, Varun
    Gupta, Chetna
    COMPUTER, 2024, 57 (11) : 118 - 123
  • [39] Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media
    The Hong Kong University of Science and Technology , China
    不详
    arXiv, 1600,
  • [40] The Imitation Game revisited: A comprehensive survey on recent advances in AI-generated text detection
    Yang, Zhiwei
    Feng, Zhengjie
    Huo, Rongxin
    Lin, Huiru
    Zheng, Hanghan
    Nie, Ruichi
    Chen, Hongrui
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272