Multilingual Hate Speech Detection: A Semi-Supervised Generative Adversarial Approach

被引:2
|
作者
Mnassri, Khouloud [1 ]
Farahbakhsh, Reza [1 ]
Crespi, Noel [1 ]
机构
[1] Inst Polytech Paris, Samovar Telecom SudParis, F-91120 Palaiseau, France
关键词
social media; hate speech; semisupervised; GAN; multilingual; PLMs; DATA AUGMENTATION; NETWORKS;
D O I
10.3390/e26040344
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Social media platforms have surpassed cultural and linguistic boundaries, thus enabling online communication worldwide. However, the expanded use of various languages has intensified the challenge of online detection of hate speech content. Despite the release of multiple Natural Language Processing (NLP) solutions implementing cutting-edge machine learning techniques, the scarcity of data, especially labeled data, remains a considerable obstacle, which further requires the use of semisupervised approaches along with Generative Artificial Intelligence (Generative AI) techniques. This paper introduces an innovative approach, a multilingual semisupervised model combining Generative Adversarial Networks (GANs) and Pretrained Language Models (PLMs), more precisely mBERT and XLM-RoBERTa. Our approach proves its effectiveness in the detection of hate speech and offensive language in Indo-European languages (in English, German, and Hindi) when employing only 20% annotated data from the HASOC2019 dataset, thereby presenting significantly high performances in each of multilingual, zero-shot crosslingual, and monolingual training scenarios. Our study provides a robust mBERT-based semisupervised GAN model (SS-GAN-mBERT) that outperformed the XLM-RoBERTa-based model (SS-GAN-XLM) and reached an average F1 score boost of 9.23% and an accuracy increase of 5.75% over the baseline semisupervised mBERT model.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Semi-Supervised MIMO Detection Using Cycle-Consistent Generative Adversarial Network
    Zhu, Hongzhi
    Guo, Yongliang
    Xu, Wei
    You, Xiaohu
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (05) : 1226 - 1240
  • [22] Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks
    Lai, Wei-Sheng
    Huang, Jia-Bin
    Yang, Ming-Hsuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [23] GENERATIVE ADVERSARIAL SEMI-SUPERVISED NETWORK FOR MEDICAL IMAGE SEGMENTATION
    Li, Chuchen
    Liu, Huafeng
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 303 - 306
  • [24] Healthy-unhealthy animal detection using semi-supervised generative adversarial network
    Almal, Shubh
    Bagepalli, Apoorva Reddy
    Dutta, Prajjwal
    Chaki, Jyotismita
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [25] Healthy-unhealthy animal detection using semi-supervised generative adversarial network
    Almal S.
    Bagepalli A.R.
    Dutta P.
    Chaki J.
    PeerJ Computer Science, 2023, 9
  • [26] Semi-supervised Text Regression with Conditional Generative Adversarial Networks
    Li, Tao
    Liu, Xudong
    Su, Shihan
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5375 - 5377
  • [27] SEMI-SUPERVISED OBJECT DETECTION IN REMOTE SENSING IMAGES USING GENERATIVE ADVERSARIAL NETWORKS
    Chen, Guowei
    Liu, Lei
    Hu, Wenlong
    Pan, Zongxu
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 2503 - 2506
  • [28] Medical image segmentation with generative adversarial semi-supervised network
    Li, Chuchen
    Liu, Huafeng
    PHYSICS IN MEDICINE AND BIOLOGY, 2021, 66 (24):
  • [29] SVGAN: Semi-supervised Generative Adversarial Network for Image Captioning
    Zhang, Yi
    Zeng, Wei
    He, Gangqiang
    Liu, Yueyuan
    2020 IEEE CONFERENCE ON TELECOMMUNICATIONS, OPTICS AND COMPUTER SCIENCE (TOCS), 2020, : 296 - 299
  • [30] Survey on Implementations of Generative Adversarial Networks for Semi-Supervised Learning
    Sajun, Ali Reza
    Zualkernan, Imran
    APPLIED SCIENCES-BASEL, 2022, 12 (03):