Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

被引:0
|
作者
Kovács G. [1 ]
Alonso P. [1 ]
Saini R. [1 ]
机构
[1] Luleå University of Technology, Aurorum 1, Luleå
关键词
BERT; Deep language processing; Hate speech; Transfer learning; Vocabulary augmentation;
D O I
10.1007/s42979-021-00457-3
中图分类号
学科分类号
摘要
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [31] A Measurement Study of Hate Speech in Social Media
    Mondal, Mainack
    Silva, Leandro Araujo
    Benevenuto, Fabricio
    PROCEEDINGS OF THE 28TH ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT'17), 2017, : 85 - 94
  • [32] Targets and Aspects in Social Media Hate Speech
    Shvets, Alexander
    Fortuna, Paula
    Soler-Company, Juan
    Wanner, Leo
    WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 179 - 190
  • [33] Hate speech: Uncovering violence on social media
    Batista, Waleska Miguel
    Silva, Fabricio de Martino Costa e
    DIREITO E PRAXIS, 2024, 15 (03):
  • [34] HATE SPEECH ON SOCIAL MEDIA - CROATIAN EXPERIENCE
    Tomisa, Mario
    Milkovic, Marin
    Vusic, Damir
    Pavicic, Ivona
    ECONOMIC AND SOCIAL DEVELOPMENT (ESD 2019), 2019, : 256 - 263
  • [35] Spread of Hate Speech in Online Social Media
    Mathew, Binny
    Dutt, Ritam
    Goyal, Pawan
    Mukherjee, Animesh
    PROCEEDINGS OF THE 11TH ACM CONFERENCE ON WEB SCIENCE (WEBSCI'19), 2019, : 173 - 182
  • [36] Classification of Hate Speech Language Detection on Social Media: Preliminary Study for Improvement
    Muzakir, Ari
    Adi, Kusworo
    Kusumaningrum, Retno
    EMERGING TRENDS IN INTELLIGENT SYSTEMS & NETWORK SECURITY, 2023, 147 : 146 - 156
  • [37] Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)
    Mehta, Harshkumar
    Passi, Kalpdrum
    ALGORITHMS, 2022, 15 (08)
  • [38] A comparative analysis of machine learning algorithms for hate speech detection in social media
    Omran, Esraa
    Al Tararwah, Estabraq
    Al Qundus, Jamal
    ONLINE JOURNAL OF COMMUNICATION AND MEDIA TECHNOLOGIES, 2023, 13 (04):
  • [39] Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT
    Jahan, Md Saroar
    Oussalah, Mourad
    Arhab, Nabil
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 876 - 882
  • [40] Cross-lingual Capsule Network for Hate Speech Detection in Social Media
    Jiang, Aiqi
    Zubiaga, Arkaitz
    PROCEEDINGS OF THE 32ND ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA (HT '21), 2021, : 217 - 223