Towards AI-Generated Essay Classification Using Numerical Text Representation

被引:1
|
作者
Krawczyk, Natalia [1 ]
Probierz, Barbara [1 ,2 ]
Kozak, Jan [1 ]
机构
[1] Univ Econ Katowice, Dept Machine Learning, 1 Maja 50, PL-40287 Katowice, Poland
[2] Lukasiewicz Res Network, Inst Innovat Technol EMAG, Leopolda 31, PL-40189 Katowice, Poland
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 21期
关键词
natural language processing; numerical text representations; text classification; large language models;
D O I
10.3390/app14219795
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The detection of essays written by AI compared to those authored by students is increasingly becoming a significant issue in educational settings. This research examines various numerical text representation techniques to improve the classification of these essays. Utilizing a diverse dataset, we undertook several preprocessing steps, including data cleaning, tokenization, and lemmatization. Our system analyzes different text representation methods such as Bag of Words, TF-IDF, and fastText embeddings in conjunction with multiple classifiers. Our experiments showed that TF-IDF weights paired with logistic regression reached the highest accuracy of 99.82%. Methods like Bag of Words, TF-IDF, and fastText embeddings achieved accuracies exceeding 96.50% across all tested classifiers. Sentence embeddings, including MiniLM and distilBERT, yielded accuracies from 93.78% to 96.63%, indicating room for further refinement. Conversely, pre-trained fastText embeddings showed reduced performance, with a lowest accuracy of 89.88% in logistic regression. Remarkably, the XGBoost classifier delivered the highest minimum accuracy of 96.24%. Specificity and precision were above 99% for most methods, showcasing high capability in differentiating between student-created and AI-generated texts. This study underscores the vital role of choosing dataset-specific text representations to boost classification accuracy.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Testing of detection tools for AI-generated text
    Weber-Wulff, Debora
    Anohina-Naumeca, Alla
    Bjelobaba, Sonja
    Foltynek, Tomas
    Guerrero-Dib, Jean
    Popoola, Olumide
    Sigut, Petr
    Waddington, Lorna
    INTERNATIONAL JOURNAL FOR EDUCATIONAL INTEGRITY, 2023, 19 (01)
  • [2] Testing of detection tools for AI-generated text
    Debora Weber-Wulff
    Alla Anohina-Naumeca
    Sonja Bjelobaba
    Tomáš Foltýnek
    Jean Guerrero-Dib
    Olumide Popoola
    Petr Šigut
    Lorna Waddington
    International Journal for Educational Integrity, 19
  • [3] Towards Detection of AI-Generated Texts and Misinformation
    Najee-Ullah, Ahmad
    Landeros, Luis
    Balytskyi, Yaroslav
    Chang, Sang-Yoon
    SOCIO-TECHNICAL ASPECTS IN SECURITY, STAST 2021, 2022, 13176 : 194 - 205
  • [4] Binary Classification Optimisation with AI-Generated Data
    Mazon, Manuel Jesus Cerezo
    Garcia, Ricardo Moya
    Garcia, Ekaitz Arriola
    del Castillo, Miguel Herencia Garcia
    Iglesias, Guillermo
    TESTING SOFTWARE AND SYSTEMS, ICTSS 2024, 2025, 15383 : 210 - 216
  • [5] Google unveils invisible 'watermark' for AI-generated text
    Gibney, Elizabeth
    NATURE, 2024, 634 (8036) : 1027 - 1028
  • [6] One-Class Learning for AI-Generated Essay Detection
    Corizzo, Roberto
    Leal-Arenas, Sebastian
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [7] Feedback sources in essay writing: peer-generated or AI-generated feedback?
    Banihashem, Seyyed Kazem
    Kerman, Nafiseh Taghizadeh
    Noroozi, Omid
    Moon, Jewoong
    Drachsler, Hendrik
    INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION, 2024, 21 (01)
  • [8] AI-generated or AI touch-up? Identifying AI contribution in text data
    Hashemi, Ahmad
    Shi, Wei
    Corriveau, Jean-Pierre
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [9] Exploring AI-Generated text in student writing: How does AI help?
    Woo, David James
    Susanto, Hengky
    Yeung, Chi Ho
    Guo, Kai
    Fung, April Ka Yeng
    LANGUAGE LEARNING & TECHNOLOGY, 2024, 28 (02): : 183 - 209
  • [10] Perception of Ai-Generated Art: Text Analysis of Online Discussions
    S. Bosonogov
    A. Suvorova
    Journal of Mathematical Sciences, 2024, 285 (1) : 1 - 13