Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引:1
|
作者
Hashmi, Ehtesham [1 ]
Yayilgan, Sule Yildirim [1 ]
Yamin, Muhammad Mudassar [1 ]
Abomhara, Mohamed [1 ]
Ullah, Mohib [2 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
关键词
Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;
D O I
10.1016/j.eswa.2024.125843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning
    Bai, Yingbin
    Yang, Erkun
    Wang, Zhaoqing
    Du, Yuxuan
    Han, Bo
    Deng, Cheng
    Wang, Dadong
    Liu, Tongliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Attention-Conditioned Augmentations for Self-Supervised Anomaly Detection and Localization
    Bozorgtabar, Behzad
    Mahapatra, Dwarikanath
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14720 - 14728
  • [3] Self-Supervised Speech Representations are More Phonetic than Semantic
    Choi, Kwanghee
    Pasad, Ankita
    Nakamura, Tomohiko
    Fukayama, Satoru
    Livescu, Karen
    Watanabe, Shinji
    INTERSPEECH 2024, 2024, : 4578 - 4582
  • [4] No Shifted Augmentations (NSA): compact distributions for robust self-supervised Anomaly Detection
    Yousef, Mohamed
    Ackermann, Marcel
    Kurup, Unmesh
    Bishop, Tom
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5500 - 5509
  • [5] Self-Supervised Effective Resolution Estimation with Adversarial Augmentations
    Kansy, Manuel
    Balletshofer, Julian
    Naruniec, Jacek
    Schroers, Christopher
    Mignone, Graziana
    Gross, Markus
    Weber, Romann M.
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 573 - 582
  • [6] Random Field Augmentations for Self-Supervised Representation Learning
    Mansfield, Philip Andrew
    Afkanpour, Arash
    Morningstar, Warren Richard
    Singhal, Karan
    NEURIPS WORKSHOP ON SYMMETRY AND GEOMETRY IN NEURAL REPRESENTATIONS, 2023, 228 : 292 - 302
  • [7] Directional Self-supervised Learning for Heavy Image Augmentations
    Bai, Yalong
    Yang, Yifan
    Zhang, Wei
    Mei, Tao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16671 - 16680
  • [8] Self-Supervised Difference Detection forWeakly-Supervised Semantic Segmentation
    Shimoda, Wataru
    Yanai, Keiji
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5207 - 5216
  • [9] Rumor detection with self-supervised learning on texts and social graph
    Gao, Yuan
    Wang, Xiang
    He, Xiangnan
    Feng, Huamin
    Zhang, Yongdong
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (04)
  • [10] Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks
    Mohapatra, Payal
    Pandey, Akash
    Sui, Yueyuan
    Zhu, Qi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9511 - 9515