Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

被引:0
|
作者
Kovács G. [1 ]
Alonso P. [1 ]
Saini R. [1 ]
机构
[1] Luleå University of Technology, Aurorum 1, Luleå
关键词
BERT; Deep language processing; Hate speech; Transfer learning; Vocabulary augmentation;
D O I
10.1007/s42979-021-00457-3
中图分类号
学科分类号
摘要
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [1] Leveraging external resources for offensive content detection in social media
    Kovacs, Gyorgy
    Alonso, Pedro
    Saini, Rajkumar
    Liwicki, Marcus
    AI COMMUNICATIONS, 2022, 35 (02) : 87 - 109
  • [2] Leveraging Transfer Learning for Hate Speech Detection in Portuguese Social Media Posts
    Ramos, Gil
    Batista, Fernando
    Ribeiro, Ricardo
    Fialho, Pedro
    Moro, Sergio
    Fonseca, Antonio
    Guerra, Rita
    Carvalho, Paula
    Marques, Catarina
    Silva, Claudia
    IEEE ACCESS, 2024, 12 : 101374 - 101389
  • [3] Hate speech detection in social media: Techniques, recent trends, and future challenges
    Rawat, Anchal
    Kumar, Santosh
    Samant, Surender Singh
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2024, 16 (02)
  • [4] Hate speech and abusive language detection in Indonesian social media: Progress and challenges
    Ibrohim, Muhammad Okky
    Budi, Indra
    HELIYON, 2023, 9 (08)
  • [5] Transfer learning for hate speech detection in social media
    Yuan, Lanqin
    Wang, Tianyu
    Ferraro, Gabriela
    Suominen, Hanna
    Rizoiu, Marian-Andrei
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2023, 6 (02): : 1081 - 1101
  • [6] Multimodal Hate Speech Detection in Greek Social Media
    Perifanos, Konstantinos
    Goutsos, Dionysis
    MULTIMODAL TECHNOLOGIES AND INTERACTION, 2021, 5 (07)
  • [7] Hate and offensive speech detection on Arabic social media
    Alsafari S.
    Sadaoui S.
    Mouhoub M.
    Online Social Networks and Media, 2020, 19
  • [8] Hate Speech Detection in Social Media for the Kurdish Language
    Saeed, Ari M.
    Ismael, Aso N.
    Rasul, Danya L.
    Majeed, Rayan S.
    Rashid, Tarik A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INNOVATIONS IN COMPUTING RESEARCH (ICR'22), 2022, 1431 : 253 - 260
  • [9] Transfer learning for hate speech detection in social media
    Lanqin Yuan
    Tianyu Wang
    Gabriela Ferraro
    Hanna Suominen
    Marian-Andrei Rizoiu
    Journal of Computational Social Science, 2023, 6 : 1081 - 1101
  • [10] Hate Speech on Social Media
    Guiora, Amos
    Park, Elizabeth A.
    PHILOSOPHIA, 2017, 45 (03) : 957 - 971