Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引:1
|
作者
Hashmi, Ehtesham [1 ]
Yayilgan, Sule Yildirim [1 ]
Yamin, Muhammad Mudassar [1 ]
Abomhara, Mohamed [1 ]
Ullah, Mohib [2 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
关键词
Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;
D O I
10.1016/j.eswa.2024.125843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] MixUp Brain-Cortical Augmentations in Self-supervised Learning
    Ambroise, Corentin
    Frouin, Vincent
    Dufumier, Benoit
    Duchesnay, Edouard
    Grigis, Antoine
    MACHINE LEARNING IN CLINICAL NEUROIMAGING, MLCN 2023, 2023, 14312 : 102 - 111
  • [12] Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
    Islam, Ashraful
    Lundell, Ben
    Sawhney, Harpreet
    Sinha, Sudipta N.
    Morales, Peter
    Radke, Richard J.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5613 - 5622
  • [13] Semi-Supervised Self-Learning for Arabic Hate Speech Detection
    Alsafari, Safa
    Sadaoui, Samira
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 863 - 868
  • [14] Self-supervised vision transformers for semantic segmentation
    Gu, Xianfan
    Hu, Yingdong
    Wen, Chuan
    Gao, Yang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [15] Self-Supervised Embodied Learning for Semantic Segmentation
    Wang, Juan
    Liu, Xinzhu
    Zhao, Dawei
    Dai, Bin
    Liu, Huaping
    2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 383 - 390
  • [16] Chemistry-Wise Augmentations for Molecule Graph Self-supervised Representation Learning
    Ondar, Evgeniia
    Makarov, Ilya
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II, 2023, 14135 : 327 - 336
  • [17] Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style
    von Kuegelgen, Julius
    Sharma, Yash
    Gresele, Luigi
    Brendel, Wieland
    Schoelkopf, Bernhard
    Besserve, Michel
    Locatello, Francesco
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [18] Anomaly Detection on the Rail Lines Using Semantic Segmentation and Self-supervised Learning
    Jahan, Kanwal
    Umesh, Jeethesh Pai
    Roth, Michael
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [19] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [20] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
    Chen, Zhehuai
    Zhang, Yu
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Wang, Gary
    Moreno, Pedro
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258