PoSLemma: How Traditional Machine Learning and Linguistics Preprocessing Aid in Machine Generated Text Detection

被引:0
|
作者
Jimenez, Diana [1 ]
Cardoso-Moreno, Marco A. [1 ]
Aguilar-Canto, Fernando [1 ]
Juarez-Gambino, Omar [1 ]
Calvo, Hiram [1 ]
机构
[1] Inst Politecn Nacl, Ctr Invest Comp, Mexico City, DF, Mexico
来源
COMPUTACION Y SISTEMAS | 2023年 / 27卷 / 04期
关键词
Generative text detection; text generation; AuTexTification; logistic regression; support vector; machine (SVM);
D O I
10.13053/CyS-27-4-4778
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the release of several Large Language Models (LLMs) to the public, concerns have emerged regarding their ethical implications and potential misuse. This paper proposes an approach to address the need for technologies that can distinguish between text sequences generated by humans and those produced by LLMs. The proposed method leverages traditional Natural Language Processing (NLP) feature extraction techniques focusing on linguistic properties, and traditional Machine Learning (ML) methods like Logistic Regression and Support Vector Machines (SVMs). We also compare this approach with an ensemble of Long-Short Term Memory (LSTM) networks, each analyzing different paradigms of Part of Speech (PoS) taggings. Our traditional ML models achieved F1 scores of 0.80 and 0.72 in the respective analyzed tasks.
引用
收藏
页码:921 / 928
页数:8
相关论文
共 50 条
  • [41] Perceptual Quality Dimensions of Machine-Generated Text with a Focus on Machine Translation
    Macketanz, Vivien
    Naderi, Babak
    Schmidt, Steven
    Moeller, Sebastian
    PROCEEDINGS OF THE 2ND WORKSHOP ON HUMAN EVALUATION OF NLP SYSTEMS (HUMEVAL 2022), 2022, : 24 - 31
  • [42] A Machine Learning Approach as an Aid for Early COVID-19 Detection
    Martinez-Velazquez, Roberto
    Tobon, Diana P., V
    Sanchez, Alejandro
    El Saddik, Abdulmotaleb
    Petriu, Emil
    SENSORS, 2021, 21 (12)
  • [43] Machine Learning based Intelligent Framework for Data Preprocessing
    Sarwar, Sohail
    Qayyum, Zia Ul
    Kaleem, Abdul
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1010 - 1015
  • [44] Evaluation of generality of inductive learning for preprocessing in machine translation
    Nagashima, Y
    Araki, K
    Tochinai, K
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 921 - 926
  • [45] An ontology-based approach for preprocessing in machine learning
    Soto, Patricia Centeno
    Ramzy, Nour
    Ocker, Felix
    Vogel-Heuser, Birgit
    INES 2021: 2021 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENGINEERING SYSTEMS, 2021,
  • [46] Towards Explaining the Effects of Data Preprocessing on Machine Learning
    Zelaya, Carlos Vladimiro Gonzalez
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2086 - 2090
  • [47] Data Preprocessing and Machine Learning Modeling for Rockburst Assessment
    Li, Jie
    Fu, Helin
    Hu, Kaixun
    Chen, Wei
    SUSTAINABILITY, 2023, 15 (18)
  • [48] Global discretization of continuous attributes as preprocessing for machine learning
    Chmielewski, MR
    GrzymalaBusse, JW
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 1996, 15 (04) : 319 - 331
  • [49] XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
    Kartashov, Oleg O.
    Chernov, Andrey V.
    Polyanichenko, Dmitry S.
    Butakova, Maria A.
    MATERIALS, 2021, 14 (24)
  • [50] Data preprocessing impact on machine learning algorithm performance
    Amato, Alberto
    Di Lecce, Vincenzo
    OPEN COMPUTER SCIENCE, 2023, 13 (01)