Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models

被引:0
|
作者
Alhazmi, Ali [1 ,2 ]
Mahmud, Rohana [1 ]
Idris, Norisma [1 ]
Abo, Mohamed Elhag Mohamed [1 ]
Eke, Christopher Ifeanyi [3 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
[2] Jazan Univ, Coll Engn & Comp Sci, Dept Comp Sci, Jazan, Saudi Arabia
[3] Fed Univ Lafia, Fac Comp, Dept Comp Sci, Lafia, Nasarawa State, Nigeria
来源
PLOS ONE | 2024年 / 19卷 / 07期
关键词
LANGUAGE;
D O I
10.1371/journal.pone.0305657
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
引用
收藏
页数:24
相关论文
共 48 条
  • [31] Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach
    Aulia, Nofa
    Budi, Indra
    ICCAI '19 - PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, : 164 - 169
  • [32] Sinhala Hate Speech Detection in Social Media using Text Mining and Machine learning
    Sandaruwan, H. M. S. T.
    Lorensuhewa, S. A. S.
    Kalyani, M. A. L.
    2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [33] A survey on hate speech detection and sentiment analysis using machine learning and deep learning models (vol 80, pg 110, 2023)
    Subramanian, Malliga
    Sathishkumar, Veerappampalayam Easwaramoorthy
    Deepalakshmi, G.
    Cho, Jaehyuk
    Manikandan, G.
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 82 : 167 - 167
  • [34] Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model
    Aldjanabi, Wassen
    Dahou, Abdelghani
    Al-qaness, Mohammed A. A.
    Abd Elaziz, Mohamed
    Helmi, Ahmed Mohamed
    Damasevicius, Robertas
    INFORMATICS-BASEL, 2021, 8 (04):
  • [35] Detection of Arabic offensive language in social media using machine learning models
    Mousa, Aya
    Shahin, Ismail
    Nassif, Ali Bou
    Elnagar, Ashraf
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
  • [36] Python code smells detection using conventional machine learning models
    Sandouka, Rana
    Aljamaan, Hamoud
    PeerJ Computer Science, 2023, 9
  • [37] Enhancing Phishing Website Detection Using Ensemble Machine Learning Models
    Baliyan, Himanshu
    Prasath, A. Rama
    2024 OPJU International Technology Conference on Smart Computing for Innovation and Advancement in Industry 4.0, OTCON 2024, 2024,
  • [38] Sarcasm Detection in Tweets: A Feature-based Approach using Supervised Machine Learning Models
    Rahaman, Arifur
    Kuri, Ratnadip
    Islam, Syful
    Hossain, Md Javed
    Kabir, Mohammed Humayun
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (06) : 454 - 460
  • [39] Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models
    Alrashidi, Bedour
    Jamal, Amani
    Alkhathlan, Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [40] Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models
    Thaher, Thaer
    Saheb, Mahmoud
    Turabieh, Hamza
    Chantar, Hamouda
    SYMMETRY-BASEL, 2021, 13 (04):