Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引:1
|
作者
Hashmi, Ehtesham [1 ]
Yayilgan, Sule Yildirim [1 ]
Yamin, Muhammad Mudassar [1 ]
Abomhara, Mohamed [1 ]
Ullah, Mohib [2 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
关键词
Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;
D O I
10.1016/j.eswa.2024.125843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
    Abhinav Valada
    Rohit Mohan
    Wolfram Burgard
    International Journal of Computer Vision, 2020, 128 : 1239 - 1285
  • [42] Self-Supervised Visual Representation Learning with Semantic Grouping
    Wen, Xin
    Zhao, Bingchen
    Zheng, Anlin
    Zhang, Xiangyu
    Qi, Xiaojuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [43] Self-supervised contrastive representation learning for semantic segmentation
    Liu B.
    Cai H.
    Wang Y.
    Chen X.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2024, 51 (01): : 125 - 134
  • [44] Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
    Araslanov, Nikita
    Roth, Stefan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15379 - 15389
  • [45] Detection of Hate Speech Texts Using Machine Learning Algorithm
    Sanoussi, Mahamat Saleh Adoum
    Chen Xiaohua
    Agordzo, George K.
    Guindo, Mahamed Lamine
    Al Omari, Abdullah Mma
    Issa, Boukhari Mahamat
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 266 - 273
  • [46] Phonetic Analysis of Self-supervised Representations of English Speech
    Wells, Dan
    Tang, Hao
    Richmond, Korin
    INTERSPEECH 2022, 2022, : 3583 - 3587
  • [47] CHARACTERIZING THE ADVERSARIAL VULNERABILITY OF SPEECH SELF-SUPERVISED LEARNING
    Wu, Haibin
    Zheng, Bo
    Li, Xu
    Wu, Xixin
    Lee, Hung-Yi
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3164 - 3168
  • [48] INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
    Huang, Zili
    Watanabe, Shinji
    Yang, Shu-wen
    Garcia, Paola
    Khudanpur, Sanjeev
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6837 - 6841
  • [49] Self-Supervised Learning With Segmental Masking for Speech Representation
    Yue, Xianghu
    Lin, Jingru
    Gutierrez, Fabian Ritter
    Li, Haizhou
    IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1367 - 1379
  • [50] Federated Self-supervised Speech Representations: Are We There Yet?
    Gao, Yan
    Fernandez-Marques, Javier
    Parcollet, Titouan
    Mehrotra, Abhinav
    Lane, Nicholas D.
    INTERSPEECH 2022, 2022, : 3809 - 3813