Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引：1

作者：

Hashmi, Ehtesham ^{[1
]}

Yayilgan, Sule Yildirim ^{[1
]}

Yamin, Muhammad Mudassar ^{[1
]}

Abomhara, Mohamed ^{[1
]}

Ullah, Mohib ^{[2
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway

[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 264卷

关键词：

Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;

D O I：

10.1016/j.eswa.2024.125843

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.

引用

页数：13

共 50 条

[11] MixUp Brain-Cortical Augmentations in Self-supervised Learning
Ambroise, Corentin
Frouin, Vincent
Dufumier, Benoit
Duchesnay, Edouard
Grigis, Antoine
MACHINE LEARNING IN CLINICAL NEUROIMAGING, MLCN 2023, 2023, 14312 : 102 - 111
[12] Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
Islam, Ashraful
Lundell, Ben
Sawhney, Harpreet
Sinha, Sudipta N.
Morales, Peter
Radke, Richard J.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5613 - 5622
[13] Semi-Supervised Self-Learning for Arabic Hate Speech Detection
Alsafari, Safa
Sadaoui, Samira
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 863 - 868
[14] Self-supervised vision transformers for semantic segmentation
Gu, Xianfan
Hu, Yingdong
Wen, Chuan
Gao, Yang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[15] Self-Supervised Embodied Learning for Semantic Segmentation
Wang, Juan
Liu, Xinzhu
Zhao, Dawei
Dai, Bin
Liu, Huaping
2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 383 - 390
[16] Chemistry-Wise Augmentations for Molecule Graph Self-supervised Representation Learning
Ondar, Evgeniia
Makarov, Ilya
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II, 2023, 14135 : 327 - 336
[17] Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style
von Kuegelgen, Julius
Sharma, Yash
Gresele, Luigi
Brendel, Wieland
Schoelkopf, Bernhard
Besserve, Michel
Locatello, Francesco
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[18] Anomaly Detection on the Rail Lines Using Semantic Segmentation and Self-supervised Learning
Jahan, Kanwal
Umesh, Jeethesh Pai
Roth, Michael
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[19] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
[20] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
Chen, Zhehuai
Zhang, Yu
Rosenberg, Andrew
Ramabhadran, Bhuvana
Wang, Gary
Moreno, Pedro
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258

← 1 2 3 4 5 →