Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引:1
|
作者
Hashmi, Ehtesham [1 ]
Yayilgan, Sule Yildirim [1 ]
Yamin, Muhammad Mudassar [1 ]
Abomhara, Mohamed [1 ]
Ullah, Mohib [2 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
关键词
Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;
D O I
10.1016/j.eswa.2024.125843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
    Lin, Tzu-Quan
    Lin, Guan-Ting
    Lee, Hung-Yi
    Tang, Hao
    arXiv,
  • [22] Boosting Self-Supervised Embeddings for Speech Enhancement
    Hung, Kuo-Hsuan
    Fu, Szu-Wei
    Tseng, Huan-Hsin
    Chiang, Hsin-Tien
    Tsao, Yu
    Lin, Chii-Wann
    INTERSPEECH 2022, 2022, : 186 - 190
  • [23] ON COMPRESSING SEQUENCES FOR SELF-SUPERVISED SPEECH MODELS
    Meng, Yen
    Chen, Hsuan-Jui
    Shi, Jiatong
    Watanabe, Shinji
    Garcia, Paola
    Lee, Hung-yi
    Tang, Hao
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1128 - 1135
  • [24] Scaling Effect of Self-Supervised Speech Models
    Pu, Jie
    Yang, Yuguang
    Li, Ruirui
    Elibol, Oguz
    Droppo, Jasha
    INTERSPEECH 2021, 2021, : 1084 - 1088
  • [25] SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS
    Chung, Yu-An
    Belinkov, Yonatan
    Glass, James
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3040 - 3044
  • [26] Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
    Laugier, Leo
    Pavlopoulos, John
    Sorensen, Jeffrey
    Dixon, Lucas
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1442 - 1461
  • [27] Image Augmentations in Planetary Science: Implications in Self-Supervised Learning and Weakly-Supervised Segmentation on Mars
    Kossmann, Dominik
    Matei, Arthur
    Wilhelm, Thorsten
    Fink, Gernot A.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2800 - 2806
  • [28] Semi-meta-supervised hate speech detection
    Putra, Cendra Devayana
    Wang, Hei-Chia
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [29] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    IEEE ACCESS, 2022, 10 : 124396 - 124407
  • [30] Adapting Self-Supervised Features for Background Speech Detection in Beehive Audio Recordings
    Guimaraes, Heitor R.
    Abdollahi, Mahsa
    Zhu, Yi
    Maucourt, Segolene
    Coallier, Nico
    Giovenazzo, Pierre
    Falk, Tiago H.
    PROCEEDINGS OF 2023 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR AGRICULTURE AND FORESTRY, METROAGRIFOR, 2023, : 663 - 667