Hate speech detection in the Arabic language: corpus design, construction, and evaluation

被引:1
|
作者
Ahmad, Ashraf [1 ]
Azzeh, Mohammad [2 ]
Alnagi, Eman [1 ]
Abu Al-Haija, Qasem [3 ]
Halabi, Dana [4 ]
Aref, Abdullah [1 ]
AbuHour, Yousef [5 ]
机构
[1] Princess Sumaya Univ Technol PSUT, Dept Comp Sci, Amman, Jordan
[2] Princess Sumaya Univ Technol PSUT, Dept Data Sci, Amman, Jordan
[3] Jordan Univ Sci & Technol, Fac Comp & Informat Technol, Dept Cybersecur, Irbid, Jordan
[4] Luminus Tech Univ Coll LTUC, SAE Inst, Amman, Jordan
[5] Princess Sumaya Univ Technol PSUT, Dept Basic Sci, Amman, Jordan
来源
关键词
Arabic hate speech; natural language processing (NLP); machine learning; Arabic hate speech detection; Arabic hate speech corpus; SOCIAL MEDIA;
D O I
10.3389/frai.2024.1345445
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Offensive Language and Hate Speech Detection for Danish
    Sigurbergsson, Gudbjartur Ingi
    Derczynski, Leon
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508
  • [22] Detection of hate speech in Arabic tweets using deep learning
    Al-Hassan, Areej
    Al-Dossari, Hmood
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1963 - 1974
  • [23] Detection of hate speech in Arabic tweets using deep learning
    Areej Al-Hassan
    Hmood Al-Dossari
    Multimedia Systems, 2022, 28 : 1963 - 1974
  • [24] A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions
    Abdelsamie, Mahmoud Mohamed
    Azab, Shahira Shaaban
    Hefny, Hesham A.
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [25] HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
    Vargas, Francielle
    Carvalho, Isabelle
    Goes, Fabiana
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7174 - 7183
  • [26] ARHNet - Leveraging Community Interaction For Detection Of Religious Hate Speech In Arabic
    Chowdhury, Arijit Ghosh
    Didolkar, Aniket
    Sawhney, Ramit
    Shah, Rajiv Ratn
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 273 - 280
  • [27] arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
    Khezzar R.
    Moursi A.
    Al Aghbari Z.
    Discover Internet of Things, 2023, 3 (01):
  • [28] Are They Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere
    Albadi, Nuha
    Kurdi, Maram
    Mishra, Shivakant
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 69 - 76
  • [29] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
    Al Anezi, Faisal Yousif
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [30] Hate Speech Detection in Social Media for the Kurdish Language
    Saeed, Ari M.
    Ismael, Aso N.
    Rasul, Danya L.
    Majeed, Rayan S.
    Rashid, Tarik A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INNOVATIONS IN COMPUTING RESEARCH (ICR'22), 2022, 1431 : 253 - 260