PoSLemma: How Traditional Machine Learning and Linguistics Preprocessing Aid in Machine Generated Text Detection

被引:0
|
作者
Jimenez, Diana [1 ]
Cardoso-Moreno, Marco A. [1 ]
Aguilar-Canto, Fernando [1 ]
Juarez-Gambino, Omar [1 ]
Calvo, Hiram [1 ]
机构
[1] Inst Politecn Nacl, Ctr Invest Comp, Mexico City, DF, Mexico
来源
COMPUTACION Y SISTEMAS | 2023年 / 27卷 / 04期
关键词
Generative text detection; text generation; AuTexTification; logistic regression; support vector; machine (SVM);
D O I
10.13053/CyS-27-4-4778
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the release of several Large Language Models (LLMs) to the public, concerns have emerged regarding their ethical implications and potential misuse. This paper proposes an approach to address the need for technologies that can distinguish between text sequences generated by humans and those produced by LLMs. The proposed method leverages traditional Natural Language Processing (NLP) feature extraction techniques focusing on linguistic properties, and traditional Machine Learning (ML) methods like Logistic Regression and Support Vector Machines (SVMs). We also compare this approach with an ensemble of Long-Short Term Memory (LSTM) networks, each analyzing different paradigms of Part of Speech (PoS) taggings. Our traditional ML models achieved F1 scores of 0.80 and 0.72 in the respective analyzed tasks.
引用
收藏
页码:921 / 928
页数:8
相关论文
共 50 条
  • [1] Learning Semantic Coherence for Machine Generated Spam Text Detection
    Bao, Mengjiao
    Li, Jianxin
    Zhang, Jian
    Peng, Hao
    Liu, Xudong
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [2] Computer-Generated Text Detection Using Machine Learning: A Systematic Review
    Beresneva, Daria
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2016, 2016, 9612 : 421 - 426
  • [3] Traditional Machine Learning for Pitch Detection
    Drugman, Thomas
    Huybrechts, Goeric
    Klimkov, Viacheslav
    Moinet, Alexis
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (11) : 1745 - 1749
  • [4] MAGE: Machine-generated Text Detection in the Wild
    Li, Yafu
    Li, Qintong
    Cui, Leyang
    Bi, Wei
    Wang, Zhilin
    Wang, Longyue
    Yang, Linyi
    Shi, Shuming
    Zhang, Yue
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 36 - 53
  • [5] Detection of Machine-Generated Text: Literature Survey
    University of Arkansas at Little Rock, United States
    arXiv,
  • [6] Significance of Medical Free-Text Preprocessing for Machine Learning Applications
    Pandian, Balaji
    Lakshmanan, Sai Saradha Kalidaikurichi
    Vandervest, John C.
    Burns, Michael L.
    ANESTHESIA AND ANALGESIA, 2020, 130 : 945 - 946
  • [7] Self-Information Loss Compensation Learning for Machine-Generated Text Detection
    Wang, Weikuan
    Feng, Ao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [8] Text Classification: How Machine Learning Is Revolutionizing Text Categorization
    Allam, Hesham
    Makubvure, Lisa
    Gyamfi, Benjamin
    Graham, Kwadwo Nyarko
    Akinwolere, Kehinde
    INFORMATION, 2025, 16 (02)
  • [9] Evaluating preprocessing by Turing Machine in text categorization
    Ghalehtaki, Razieh Abbasi
    Khotanlou, Hassan
    Esmaeilpour, Mansour
    2014 IRANIAN CONFERENCE ON INTELLIGENT SYSTEMS (ICIS), 2014,
  • [10] Improvization of Arrhythmia Detection Using Machine Learning and Preprocessing Techniques
    Babbar, Sarthak
    Kulshrestha, Sudhanshu
    Shangle, Kartik
    Dewan, Navroz
    Kesarwani, Saommya
    APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES IN ENGINEERING, VOL 2, 2019, 697 : 537 - 550