Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach

被引：2

作者：

Mnassri, Khouloud ^{[1
]}

Farahbakhsh, Reza ^{[1
]}

Crespi, Noel ^{[1
]}

机构：

[1] Inst Polytech Paris, Samovar Telecom SudParis, F-91120 Palaiseau, France

来源：

ENTROPY | 2024年 / 26卷 / 04期

关键词：

social media; hate speech; semisupervised; GAN; multilingual; PLMs; DATA AUGMENTATION; NETWORKS;

D O I：

10.3390/e26040344

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model.

引用

页数：19

共 50 条

[21] Semi-Supervised MIMO Detection Using Cycle-Consistent Generative Adversarial Network
Zhu, Hongzhi
Guo, Yongliang
Xu, Wei
You, Xiaohu
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (05) : 1226 - 1240
[22] Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks
Lai, Wei-Sheng
Huang, Jia-Bin
Yang, Ming-Hsuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[23] GENERATIVE ADVERSARIAL SEMI-SUPERVISED NETWORK FOR MEDICAL IMAGE SEGMENTATION
Li, Chuchen
Liu, Huafeng
2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 303 - 306
[24] Healthy-unhealthy animal detection using semi-supervised generative adversarial network
Almal, Shubh
Bagepalli, Apoorva Reddy
Dutta, Prajjwal
Chaki, Jyotismita
PEERJ COMPUTER SCIENCE, 2023, 9
[25] Healthy-unhealthy animal detection using semi-supervised generative adversarial network
Almal S.
Bagepalli A.R.
Dutta P.
Chaki J.
PeerJ Computer Science, 2023, 9
[26] Semi-supervised Text Regression with Conditional Generative Adversarial Networks
Li, Tao
Liu, Xudong
Su, Shihan
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5375 - 5377
[27] SEMI-SUPERVISED OBJECT DETECTION IN REMOTE SENSING IMAGES USING GENERATIVE ADVERSARIAL NETWORKS
Chen, Guowei
Liu, Lei
Hu, Wenlong
Pan, Zongxu
IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 2503 - 2506
[28] Medical image segmentation with generative adversarial semi-supervised network
Li, Chuchen
Liu, Huafeng
PHYSICS IN MEDICINE AND BIOLOGY, 2021, 66 (24):
[29] SVGAN: Semi-supervised Generative Adversarial Network for Image Captioning
Zhang, Yi
Zeng, Wei
He, Gangqiang
Liu, Yueyuan
2020 IEEE CONFERENCE ON TELECOMMUNICATIONS, OPTICS AND COMPUTER SCIENCE (TOCS), 2020, : 296 - 299
[30] Survey on Implementations of Generative Adversarial Networks for Semi-Supervised Learning
Sajun, Ali Reza
Zualkernan, Imran
APPLIED SCIENCES-BASEL, 2022, 12 (03):

← 1 2 3 4 5 →