Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations

被引:1
|
作者
Hashmi, Ehtesham [1 ]
Yayilgan, Sule Yildirim [1 ]
Yamin, Muhammad Mudassar [1 ]
Abomhara, Mohamed [1 ]
Ullah, Mohib [2 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Comp Sci IDI, Intelligent Syst & Analyt ISA Res Grp, Teknologivegen 22, N-2815 Gjovik, Innlandet, Norway
关键词
Hate speech; Natural language processing; Data augmentation; Self-representation learning; Transformers;
D O I
10.1016/j.eswa.2024.125843
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The proliferation of social media platforms has significantly contributed to the spread of hate speech, targeting individuals based on race, gender, impaired functioning, religion, or sexual orientation. Online hate speech not only provokes prejudice and violence in cyber-space, but it also has profound impacts in real-world communities, eroding social harmony and increasing the risk of physical harm. This necessitates the urgency for effective hate speech detection systems, especially in low-resource languages such as Norwegian, where limited data availability presents additional challenges. This study utilizes the Barlow Twins methodology, applying a self-supervised learning framework to initially develop robust language representations for Norwegian, a language that is typically underrepresented in NLP research. These learned representations are then utilized in a semi-supervised classification task to detect hate speech. Leveraging a combination of text augmentation techniques at both the word and sentence level, along with self-training strategies, our approach demonstrates the potential to efficiently learn meaningful representations with a minimal amount of annotated data. Experimental results show that the Nor-BERT model is well-suited for detecting hate speech within the limited Norwegian data available, consistently outperforming other models. Additionally, Nor-BERT surpassed all deep learning-based models in terms of F1-score.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Somatisation Disorder Detection via Speech: Introducing a Self-Supervised Learning Model
    Bao, Zhihao
    Qian, Kun
    Zhao, Zhonghao
    Sun, Mengkai
    Huang, Ruolan
    Xu, Dewen
    Hu, Bin
    Yamamoto, Yoshiharu
    Schuller, Bjorn W.
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [32] Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition
    Wang, Liming
    Feng, Siyuan
    Hasegawa-Johnson, Mark
    Yoo, Chang
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8027 - 8047
  • [33] Weakly supervised semantic segmentation via self-supervised destruction learning
    Li, Jinlong
    Jie, Zequn
    Wang, Xu
    Zhou, Yu
    Ma, Lin
    Jiang, Jianmin
    NEUROCOMPUTING, 2023, 561
  • [34] Self-supervised learning for outlier detection
    Diers, Jan
    Pigorsch, Christian
    STAT, 2021, 10 (01):
  • [35] Self-Supervised Bug Detection and Repair
    Allamanis, Miltiadis
    Jackson-Flux, Henry
    Brockschmidt, Marc
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] Self-Supervised Multisensor Change Detection
    Saha, Sudipan
    Ebel, Patrick
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [37] Self-supervised Semantic Segmentation: Consistency over Transformation
    Karimijafarbigloo, Sanaz
    Azad, Reza
    Kazerouni, Amirhossein
    Velichko, Yury
    Bagci, Ulas
    Merhof, Dorit
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2646 - 2655
  • [38] LaneCorrect: Self-Supervised Lane Detection
    Nie, Ming
    Cai, Xinyue
    Xu, Hang
    Zhang, Li
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [39] Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
    Valada, Abhinav
    Mohan, Rohit
    Burgard, Wolfram
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1239 - 1285
  • [40] Self-Supervised Learning of Object Parts for Semantic Segmentation
    Ziegler, Adrian
    Asano, Yuki M.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14482 - 14491