Hate and offensive speech detection on Arabic social media

被引:41
|
作者
Alsafari S. [1 ,2 ]
Sadaoui S. [1 ]
Mouhoub M. [1 ]
机构
[1] University of Regina, Regina
[2] University of Jeddah, Jeddah
来源
关键词
Arabic corpus; Data annotation; Data extraction; Deep learning; Feature extraction; Hate speech; Multi-class classification; Social media;
D O I
10.1016/j.osnem.2020.100096
中图分类号
学科分类号
摘要
We are witnessing an increasing proliferation of hate speech on social media targeting individuals for their protected characteristics. Our study aims to devise an effective Arabic hate and offensive speech detection framework to address this serious issue. First, we built a reliable Arabic textual corpus by crawling data from Twitter using four robust extraction strategies that we implement based on four types of hate: religion, ethnicity, nationality, and gender. Next, we label the corpus based on a three-hierarchical annotation scheme in which we verify the inter annotation agreement to ensure ground truth at each level. Based on machine and deep learning techniques, we develop numerous two-class, three-class, and six-class classification models that we combine with a variety of feature extraction techniques, such as contextual word embeddings. Finally, we conduct an intensive experiment to assess the performance of the different learned models and to examine the misclassification errors. The performance results are very encouraging compared to prior hate and offensive speech studies carried out on Arabic and other languages. © 2020 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [41] Hate Speech Detection on Social Media Using Graph Convolutional Networks
    Nagar, Seema
    Gupta, Sameer
    Bahushruth, C. S.
    Barbhuiya, Ferdous Ahmed
    Dey, Kuntal
    COMPLEX NETWORKS & THEIR APPLICATIONS X, VOL 2, 2022, 1016 : 3 - 14
  • [42] Bilingual hate speech detection on social media: Amharic and Afaan Oromo
    Ababu, Teshome Mulugeta
    Woldeyohannis, Michael Melese
    Getaneh, Emuye Bawoke
    JOURNAL OF BIG DATA, 2025, 12 (01)
  • [43] A Systematic Bibliometric Analysis of Hate Speech Detection on Social Media Sites
    Gangurde, Akshaya
    Mankar, Purva
    Chaudhari, Deptii
    Pawar, Ambika
    JOURNAL OF SCIENTOMETRIC RESEARCH, 2022, 11 (01) : 100 - 111
  • [44] BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis
    Althobaiti, Maha Jarallah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 972 - 980
  • [45] EnsMulHateCyb: Multilingual hate speech and cyberbully detection in online social media
    Mahajan, Esshaan
    Mahajan, Hemaank
    Kumar, Sanjay
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236
  • [46] Vulnerable community identification using hate speech detection on social media
    Mossie, Zewdie
    Wang, Jenq-Haur
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (03)
  • [47] Vietnamese hate and offensive detection using PhoBERT-CNN and social media streaming data
    Khanh Quoc Tran
    An Trong Nguyen
    Phu Gia Hoang
    Canh Duc Luu
    Trong-Hop Do
    Kiet Van Nguyen
    Neural Computing and Applications, 2023, 35 : 573 - 594
  • [48] Vietnamese hate and offensive detection using PhoBERT-CNN and social media streaming data
    Khanh Quoc Tran
    An Trong Nguyen
    Phu Gia Hoang
    Canh Duc Luu
    Trong-Hop Do
    Kiet Van Nguyen
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (01): : 573 - 594
  • [50] On the Impact ofWord Representation in Hate Speech and Offensive Language Detection and Explanation
    Hu, Ruijia
    Dorris, Wyatt
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020, 2020, : 171 - 173