Emojis as anchors to detect Arabic offensive language and hate speech

被引:6
|
作者
Mubarak, Hamdy [1 ]
Hassan, Sabit [2 ]
Chowdhury, Shammur Absar [1 ]
机构
[1] Hamad Bin Khalifa Univ, Qatar Comp Res Inst, Doha, Qatar
[2] Univ Pittsburgh, Sch Comp & Informat, Pittsburgh, PA USA
关键词
Offensive language; Hate speech; Emojis; Text classification; Social media analysis;
D O I
10.1017/S1351324923000402
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets-analyzing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar, and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets-a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube, and Facebook, for assessing generalization capability. Competitive results on these datasets suggest that the data collected using our method capture universal characteristics of offensive language. Our findings also highlight the common words used in offensive communications, common targets for hate speech, specific patterns in violence tweets, and pinpoint common classification errors that can be attributed to limitations of NLP models. We observe that even state-of-the-art transformer models may fail to take into account culture, background, and context or understand nuances present in real-world data such as sarcasm.
引用
收藏
页码:1436 / 1457
页数:22
相关论文
共 50 条
  • [31] Online Hate A Study on the Feasibility to Detect Hate Speech in Swedish
    Fernquist, Johan
    Lindholm, Oskar
    Kaati, Lisa
    Akrami, Nazar
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4724 - 4729
  • [32] Hate Speech as Protected Conduct: Reworking the Approach to Offensive Speech under the NLRA
    Thelen, Carly
    IOWA LAW REVIEW, 2019, 104 (02) : 985 - 1015
  • [33] A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora
    de Oliveira, Aillkeen Bezerra
    Baptista, Claudio de Souza
    Firmino, Anderson Almeida
    de Paiva, Anselmo Cardoso
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1461 - 1468
  • [34] Interpretable and High-Performance Hate and Offensive Speech Detection
    Babaeianjelodar, Marzieh
    Prudhvi, Gurram Poorna
    Lorenz, Stephen
    Chen, Keyu
    Mondal, Sumona
    Dey, Soumyabrata
    Kumar, Navin
    HCI INTERNATIONAL 2022 - LATE BREAKING PAPERS: INTERACTING WITH EXTENDED REALITY AND ARTIFICIAL INTELLIGENCE, 2022, 13518 : 233 - 244
  • [35] How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?
    Fortuna, Paula
    Soler-Company, Juan
    Wanner, Leo
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [36] Hate Speech and Offensive Language Detection: A New Feature Set with Filter-Embedded Combining Feature Selection
    Aziz, Noor Azeera Abdul
    Maarof, Mohd Aizaini
    Zainal, Anazida
    2021 3RD INTERNATIONAL CYBER RESILIENCE CONFERENCE (CRC), 2021, : 78 - 83
  • [37] Cross-Lingual Few-Shot Hate Speech and Offensive Language Detection Using Meta Learning
    Mozafari, Marzieh
    Farahbakhsh, Reza
    Crespi, Noel
    IEEE ACCESS, 2022, 10 : 14880 - 14896
  • [38] Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection
    Hassan AL-Sukhani
    Qusay Bsoul
    Abdelrahman H. Elhawary
    Ziad M. Nasr
    Ahmed E. Mansour
    Radwan M. Batyha
    Basma S. Alqadi
    Jehad Saad Alqurni
    Hayat Alfagham
    Magda M. Madbouly
    SN Computer Science, 6 (3)
  • [39] Arabic hate speech detection system based on AraBERT
    Higher Institute of Computer, Science and Multimedia of Sfax, sfax, Tunisia
    不详
    Proc. IEEE Int. Conf. Cogn. Informatics Cogn. Comput. ICCI*CC, 2022, (208-213):