Hate speech detection in the Arabic language: corpus design, construction, and evaluation

被引：1

作者：

Ahmad, Ashraf ^{[1
]}

Azzeh, Mohammad ^{[2
]}

Alnagi, Eman ^{[1
]}

Abu Al-Haija, Qasem ^{[3
]}

Halabi, Dana ^{[4
]}

Aref, Abdullah ^{[1
]}

AbuHour, Yousef ^{[5
]}

机构：

[1] Princess Sumaya Univ Technol PSUT, Dept Comp Sci, Amman, Jordan

[2] Princess Sumaya Univ Technol PSUT, Dept Data Sci, Amman, Jordan

[3] Jordan Univ Sci & Technol, Fac Comp & Informat Technol, Dept Cybersecur, Irbid, Jordan

[4] Luminus Tech Univ Coll LTUC, SAE Inst, Amman, Jordan

[5] Princess Sumaya Univ Technol PSUT, Dept Basic Sci, Amman, Jordan

来源：

FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2024年 / 7卷

关键词：

Arabic hate speech; natural language processing (NLP); machine learning; Arabic hate speech detection; Arabic hate speech corpus; SOCIAL MEDIA;

D O I：

10.3389/frai.2024.1345445

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.

引用

页数：19

共 50 条

[21] Offensive Language and Hate Speech Detection for Danish
Sigurbergsson, Gudbjartur Ingi
Derczynski, Leon
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508
[22] Detection of hate speech in Arabic tweets using deep learning
Al-Hassan, Areej
Al-Dossari, Hmood
MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1963 - 1974
[23] Detection of hate speech in Arabic tweets using deep learning
Areej Al-Hassan
Hmood Al-Dossari
Multimedia Systems, 2022, 28 : 1963 - 1974
[24] A comprehensive review on Arabic offensive language and hate speech detection on social media: methods, challenges and solutions
Abdelsamie, Mahmoud Mohamed
Azab, Shahira Shaaban
Hefny, Hesham A.
SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
[25] HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
Vargas, Francielle
Carvalho, Isabelle
Goes, Fabiana
Pardo, Thiago A. S.
Benevenuto, Fabricio
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7174 - 7183
[26] ARHNet - Leveraging Community Interaction For Detection Of Religious Hate Speech In Arabic
Chowdhury, Arijit Ghosh
Didolkar, Aniket
Sawhney, Ramit
Shah, Rajiv Ratn
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 273 - 280
[27] arHateDetector: detection of hate speech from standard and dialectal Arabic Tweets
Khezzar R.
Moursi A.
Al Aghbari Z.
Discover Internet of Things, 2023, 3 (01):
[28] Are They Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere
Albadi, Nuha
Kurdi, Maram
Mishra, Shivakant
2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018, : 69 - 76
[29] Arabic Hate Speech Detection Using Deep Recurrent Neural Networks
Al Anezi, Faisal Yousif
APPLIED SCIENCES-BASEL, 2022, 12 (12):
[30] Hate Speech Detection in Social Media for the Kurdish Language
Saeed, Ari M.
Ismael, Aso N.
Rasul, Danya L.
Majeed, Rayan S.
Rashid, Tarik A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INNOVATIONS IN COMPUTING RESEARCH (ICR'22), 2022, 1431 : 253 - 260

← 1 2 3 4 5 →