BadRock at SemEval-2024 Task 8: DistilBERT to Detect Multigenerator, Multidomain and Multilingual Black-Box Machine-Generated Text

被引:0
|
作者
Siino, Marco [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of Large Language Models (LLMs) has brought about a notable shift, rendering them increasingly ubiquitous and readily accessible. Across diverse platforms such as social media platforms, news outlets, educational platforms, question-answering forums, and even academic domains, there has been a notable surge in machine-generated content. Recent iterations of LLMs, exemplified by models like ChatGPT and GPT-4, exhibit a remarkable ability to produce coherent and contextually relevant responses across a broad spectrum of user inquiries. The fluidity and sophistication of these generated texts position LLMs as compelling candidates for substituting human labour in numerous applications. Nevertheless, this proliferation of machine-generated content has raised apprehensions regarding potential misuse, including the dissemination of misinformation and disruption of educational ecosystems. Given that humans marginally outperform random chance in discerning between machine-generated and human-authored text, there arises a pressing imperative to develop automated systems capable of accurately distinguishing machine-generated text. This pursuit is driven by the overarching objective of curbing the potential misuse of machine-generated content. Our manuscript delineates the approach we adopted for participation in this competition. Specifically, we detail the fine-tuning and the use of a DistilBERT model for classifying each sample in the test set provided. Our submission is able to reach an accuracy equal to 0.754 in place of the worst result obtained at the competition that is equal to 0.231.
引用
收藏
页码:239 / 245
页数:7
相关论文
共 48 条
  • [21] Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text
    Ebrahimi, Seyedeh Fatemeh
    Azari, Karim Akhavan
    Iravani, Amirmasoud
    Qazvini, Arian
    Sadeghi, Pouya
    Taghavi, Zeinab Sadat
    Sameti, Hossein
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 565 - 572
  • [22] Groningen Team F at SemEval-2024 Task 8: Detecting Machine-Generated Text using Feature-Based Machine Learning Models
    Donker, Rina
    Overbeek, Bjorn
    van Thulden, Dennis
    Zwagers, Oscar
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1919 - 1925
  • [23] Team Unibuc - NLP at SemEval-2024 Task 8: Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection
    Marchitan, Teodor-George
    Creanga, Claudiu
    Dinu, Liviu P.
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 403 - 411
  • [24] CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification
    Rezaei, MohammadHossein
    Kwon, Yeaeun
    Sanayei, Reza
    Singh, Abhyuday
    Bethard, Steven
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1498 - 1504
  • [25] CUNLP at SemEval-2024 Task 8: Classify Human and AI Generated Text
    Pranjal, Aggarwal
    Deepanshu, Sachdeva
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1 - 6
  • [26] Team MGTD4ADL at SemEval-2024 Task 8: Leveraging (Sentence) Transformer Models with Contrastive Learning for Identifying Machine-Generated Text
    Chen, Huixin
    Buessing, Jan
    Ruegamer, David
    Nie, Ercong
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1711 - 1718
  • [27] SCaLAR at SemEval-2024 Task 8: Unmasking the machine : Exploring the power of RoBERTa Ensemble for Detecting Machine Generated Text
    Kumar, Anand M.
    Abhin, B.
    Murali, Sidhaarth Sredharan
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1135 - 1139
  • [28] USTC-BUPT at SemEval-2024 Task 8: Enhancing Machine-Generated Text Detection via Domain Adversarial Neural Networks and LLM Embeddings
    Guo, Zikang
    Jiao, Kaijie
    Yao, Xingyu
    Wan, Yuning
    Li, Haoran
    Xu, Benfeng
    Zhang, Licheng
    Wang, Quan
    Zhang, Yongdong
    Mao, Zhendong
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1511 - 1522
  • [29] AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text
    Gu, Renhua
    Meng, Xiangfeng
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1476 - 1481
  • [30] Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting AI-generated Text
    Xu, Xiaoman
    Li, Xiangrun
    Wang, Taihang
    Tian, Jianxiang
    Jiang, Ye
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 463 - 470