Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach

被引：2

作者：

Mnassri, Khouloud ^{[1
]}

Farahbakhsh, Reza ^{[1
]}

Crespi, Noel ^{[1
]}

机构：

[1] Inst Polytech Paris, Samovar Telecom SudParis, F-91120 Palaiseau, France

来源：

ENTROPY | 2024年 / 26卷 / 04期

关键词：

social media; hate speech; semisupervised; GAN; multilingual; PLMs; DATA AUGMENTATION; NETWORKS;

D O I：

10.3390/e26040344

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model.

引用

页数：19

共 50 条

[31] Optimization of semi-supervised generative adversarial network models: a survey
Ma, Yongqing
Zheng, Yifeng
Zhang, Wenjie
Wei, Baoya
Lin, Ziqiong
Liu, Weiqiang
Li, Zhehan
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2024, 17 (04) : 705 - 736
[32] A semi-supervised approach to fault detection and diagnosis for building HVAC systems based on the modified generative adversarial network
Li, Bingxu
Cheng, Fanyong
Cai, Hui
Zhang, Xin
Cai, Wenjian
ENERGY AND BUILDINGS, 2021, 246
[33] A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection
Lo, Siaw Ling
Cambria, Erik
Chiong, Raymond
Cornforth, David
KNOWLEDGE-BASED SYSTEMS, 2016, 105 : 236 - 247
[34] Semi-meta-supervised hate speech detection
Putra, Cendra Devayana
Wang, Hei-Chia
KNOWLEDGE-BASED SYSTEMS, 2024, 287
[35] Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection
Saifullah, Shoffan
Drezewski, Rafal
Dwiyanto, Felix Andika
Aribowo, Agus Sasmito
Fauziah, Yuli
Cahyana, Nur Heri
APPLIED SCIENCES-BASEL, 2024, 14 (03):
[36] Semi-supervised Malicious Traffic Detection with Improved Wasserstein Generative Adversarial Network with Gradient Penalty
Wang, Jiafeng
Liu, Ming
Yin, Xiaokang
Zhao, Yuhao
Liu, Shengli
2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1916 - 1922
[37] Quantum semi-supervised generative adversarial network for enhanced data classification
Nakaji, Kouhei
Yamamoto, Naoki
SCIENTIFIC REPORTS, 2021, 11 (01)
[38] Quantum semi-supervised generative adversarial network for enhanced data classification
Kouhei Nakaji
Naoki Yamamoto
Scientific Reports, 11
[39] Pulsar candidate identification using semi-supervised generative adversarial networks
Balakrishnan, Vishnu
Champion, David
Barr, Ewan
Kramer, Michael
Sengar, Rahul
Bailes, Matthew
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2021, 505 (01) : 1180 - 1194
[40] A Semi-supervised Encoder Generative Adversarial Networks Model for Image Classification
Fu, Xiao
Shen, Yuan-Tong
Li, Hong-Wei
Cheng, Xiao-Mei
Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (03): : 531 - 539

← 1 2 3 4 5 →