Hate and offensive speech detection on Arabic social media

被引:41
|
作者
Alsafari S. [1 ,2 ]
Sadaoui S. [1 ]
Mouhoub M. [1 ]
机构
[1] University of Regina, Regina
[2] University of Jeddah, Jeddah
来源
关键词
Arabic corpus; Data annotation; Data extraction; Deep learning; Feature extraction; Hate speech; Multi-class classification; Social media;
D O I
10.1016/j.osnem.2020.100096
中图分类号
学科分类号
摘要
We are witnessing an increasing proliferation of hate speech on social media targeting individuals for their protected characteristics. Our study aims to devise an effective Arabic hate and offensive speech detection framework to address this serious issue. First, we built a reliable Arabic textual corpus by crawling data from Twitter using four robust extraction strategies that we implement based on four types of hate: religion, ethnicity, nationality, and gender. Next, we label the corpus based on a three-hierarchical annotation scheme in which we verify the inter annotation agreement to ensure ground truth at each level. Based on machine and deep learning techniques, we develop numerous two-class, three-class, and six-class classification models that we combine with a variety of feature extraction techniques, such as contextual word embeddings. Finally, we conduct an intensive experiment to assess the performance of the different learned models and to examine the misclassification errors. The performance results are very encouraging compared to prior hate and offensive speech studies carried out on Arabic and other languages. © 2020 Elsevier B.V.
引用
收藏
相关论文
共 50 条
  • [21] Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
    Alsafari, Safa
    Sadaoui, Samira
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1621 - 1645
  • [22] A curated dataset for hate speech detection on social media text
    Mody, Devansh
    Huang, YiDong
    de Oliveira, Thiago Eustaquio Alves
    DATA IN BRIEF, 2023, 46
  • [23] Automatic Hate Speech Detection on Social Media: A Brief Survey
    Alrehili, Ahlam
    2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [24] Afaan Oromo Hate Speech Detection and Classification on Social Media
    Ababu, Teshome Mulugeta
    Woldeyohannis, Michael Melese
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6612 - 6619
  • [25] Intelligent detection of hate speech in Arabic social network: A machine learning approach
    Aljarah, Ibrahim
    Habib, Maria
    Hijazi, Neveen
    Faris, Hossam
    Qaddoura, Raneem
    Hammo, Bassam
    Abushariah, Mohammad
    Alfawareh, Mohammad
    JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 483 - 501
  • [26] Detection of Arabic offensive language in social media using machine learning models
    Mousa, Aya
    Shahin, Ismail
    Nassif, Ali Bou
    Elnagar, Ashraf
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
  • [27] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    INFORMATICS-BASEL, 2021, 8 (04):
  • [28] Hate speech detection with ADHAR: a multi-dialectal hate speech corpus in Arabic
    Charfi, Anis
    Besghaier, Mabrouka
    Akasheh, Raghda
    Atalla, Andria
    Zaghouani, Wajdi
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [29] Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection
    Hassan AL-Sukhani
    Qusay Bsoul
    Abdelrahman H. Elhawary
    Ziad M. Nasr
    Ahmed E. Mansour
    Radwan M. Batyha
    Basma S. Alqadi
    Jehad Saad Alqurni
    Hayat Alfagham
    Magda M. Madbouly
    SN Computer Science, 6 (3)
  • [30] Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection
    Watanabe, Hajime
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    IEEE ACCESS, 2018, 6 : 13825 - 13835