PoSLemma: How Traditional Machine Learning and Linguistics Preprocessing Aid in Machine Generated Text Detection

被引:0
|
作者
Jimenez, Diana [1 ]
Cardoso-Moreno, Marco A. [1 ]
Aguilar-Canto, Fernando [1 ]
Juarez-Gambino, Omar [1 ]
Calvo, Hiram [1 ]
机构
[1] Inst Politecn Nacl, Ctr Invest Comp, Mexico City, DF, Mexico
来源
COMPUTACION Y SISTEMAS | 2023年 / 27卷 / 04期
关键词
Generative text detection; text generation; AuTexTification; logistic regression; support vector; machine (SVM);
D O I
10.13053/CyS-27-4-4778
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the release of several Large Language Models (LLMs) to the public, concerns have emerged regarding their ethical implications and potential misuse. This paper proposes an approach to address the need for technologies that can distinguish between text sequences generated by humans and those produced by LLMs. The proposed method leverages traditional Natural Language Processing (NLP) feature extraction techniques focusing on linguistic properties, and traditional Machine Learning (ML) methods like Logistic Regression and Support Vector Machines (SVMs). We also compare this approach with an ensemble of Long-Short Term Memory (LSTM) networks, each analyzing different paradigms of Part of Speech (PoS) taggings. Our traditional ML models achieved F1 scores of 0.80 and 0.72 in the respective analyzed tasks.
引用
收藏
页码:921 / 928
页数:8
相关论文
共 50 条
  • [21] Machine learning-based guilt detection in text
    Meque, Abdul Gafar Manuel
    Hussain, Nisar
    Sidorov, Grigori
    Gelbukh, Alexander
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [22] Text Detection in Document Images by Machine Learning Algorithms
    Zelenika, Darko
    Povh, Janez
    Zenko, Bernard
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 169 - 179
  • [23] Effect of Data Preprocessing in the Detection of Epilepsy using Machine Learning Techniques
    Sabarivani, A.
    Ramadevi, R.
    Pandian, R.
    Krishnamoorthy, N. R.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2021, 80 (12): : 1066 - 1077
  • [24] How can machine learning aid behavioral marketing research?
    Hagen, Linda
    Uetake, Kosuke
    Yang, Nathan
    Bollinger, Bryan
    Chaney, Allison J. B.
    Dzyabura, Daria
    Etkin, Jordan
    Goldfarb, Avi
    Liu, Liu
    Sudhir, K.
    Wang, Yanwen
    Wright, James R.
    Zhu, Ying
    MARKETING LETTERS, 2020, 31 (04) : 361 - 370
  • [25] How can machine learning aid behavioral marketing research?
    Linda Hagen
    Kosuke Uetake
    Nathan Yang
    Bryan Bollinger
    Allison J. B. Chaney
    Daria Dzyabura
    Jordan Etkin
    Avi Goldfarb
    Liu Liu
    K. Sudhir
    Yanwen Wang
    James R. Wright
    Ying Zhu
    Marketing Letters, 2020, 31 : 361 - 370
  • [26] RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
    Dugan, Liam
    Ippolito, Daphne
    Kirubarajan, Arun
    Callison-Burch, Chris
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 189 - 196
  • [27] Domain generated algorithms detection applying a combination of a deep feature selection and traditional machine learning models
    Hassaoui, Mohamed
    Hanini, Mohamed
    El Kafhali, Said
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (01) : 85 - 105
  • [28] Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification
    Kamath, Cannannore Nidhi
    Bukhari, Syed Saqib
    Dengel, Andreas
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [29] Machine Learning Preprocessing Method for Suicide Prediction
    Iliou, Theodoros
    Konstantopoulou, Georgia
    Ntekouli, Mandani
    Lymberopoulos, Dimitrios
    Assimakopoulos, Konstantinos
    Galiatsatos, Dimitrios
    Anastassopoulos, George
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2016, 2016, 475 : 53 - 60
  • [30] Machine learning-based algorithmically generated domain detection?
    Wang, Zheng
    Guo, Yang
    Montgomery, Doug
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100