Code-mixing unveiled: Enhancing the hate speech detection in Arabic dialect tweets using machine learning models

被引:0
|
作者
Alhazmi, Ali [1 ,2 ]
Mahmud, Rohana [1 ]
Idris, Norisma [1 ]
Abo, Mohamed Elhag Mohamed [1 ]
Eke, Christopher Ifeanyi [3 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
[2] Jazan Univ, Coll Engn & Comp Sci, Dept Comp Sci, Jazan, Saudi Arabia
[3] Fed Univ Lafia, Fac Comp, Dept Comp Sci, Lafia, Nasarawa State, Nigeria
来源
PLOS ONE | 2024年 / 19卷 / 07期
关键词
LANGUAGE;
D O I
10.1371/journal.pone.0305657
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Technological developments over the past few decades have changed the way people communicate, with platforms like social media and blogs becoming vital channels for international conversation. Even though hate speech is vigorously suppressed on social media, it is still a concern that needs to be constantly recognized and observed. The Arabic language poses particular difficulties in the detection of hate speech, despite the considerable efforts made in this area for English-language social media content. Arabic calls for particular consideration when it comes to hate speech detection because of its many dialects and linguistic nuances. Another degree of complication is added by the widespread practice of "code-mixing," in which users merge various languages smoothly. Recognizing this research vacuum, the study aims to close it by examining how well machine learning models containing variation features can detect hate speech, especially when it comes to Arabic tweets featuring code-mixing. Therefore, the objective of this study is to assess and compare the effectiveness of different features and machine learning models for hate speech detection on Arabic hate speech and code-mixing hate speech datasets. To achieve the objectives, the methodology used includes data collection, data pre-processing, feature extraction, the construction of classification models, and the evaluation of the constructed classification models. The findings from the analysis revealed that the TF-IDF feature, when employed with the SGD model, attained the highest accuracy, reaching 98.21%. Subsequently, these results were contrasted with outcomes from three existing studies, and the proposed method outperformed them, underscoring the significance of the proposed method. Consequently, our study carries practical implications and serves as a foundational exploration in the realm of automated hate speech detection in text.
引用
收藏
页数:24
相关论文
共 48 条
  • [21] Sinhala Hate Speech Detection in Social Media Using Machine Learning and Deep Learning
    Fernando, W. S. S.
    Weerasinghe, Ruvan
    Bandara, E. R. A. D.
    2022 22ND INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2022,
  • [22] Hate Speech Detection in Social Networks using Machine Learning and Deep Learning Methods
    Toktarova, Aigerim
    Syrlybay, Dariga
    Myrzakhmetova, Bayan
    Anuarbekova, Gulzat
    Rakhimbayeva, Gulbarshin
    Zhylanbaeva, Balkiya
    Suieuova, Nabat
    Kerimbekov, Mukhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) : 396 - 406
  • [23] Arabic Cyberbullying Detection: Enhancing Performance by Using Ensemble Machine Learning
    Haidar, Batoul
    Chamoun, Maroun
    Serhrouchni, Ahmed
    2019 INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2019, : 323 - 327
  • [24] Correction: Automatic hate speech detection in audio using machine learning algorithms
    Joan L. Imbwaga
    Nagaratna B. Chittaragi
    Shashidhar G. Koolagudi
    International Journal of Speech Technology, 2025, 28 (1) : 313 - 313
  • [25] Multilingual hope speech detection from tweets using transfer learning models
    Ahmad, Muhammad
    Ameer, Iqra
    Sharif, Wareesa
    Usman, Sardar
    Muzamil, Muhammad
    Hamza, Ameer
    Jalal, Muhammad
    Batyrshin, Ildar
    Sidorov, Grigori
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [26] Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection
    Luu, Son T.
    Nguyen, Hung P.
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    2020 RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES (RIVF 2020), 2020, : 1 - 6
  • [27] Accelerating automatic hate speech detection using parallelized ensemble learning models
    Agarwal, Shivang
    Sonawane, Ankur
    Chowdary, C. Ravindranath
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [28] Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
    Mohapatra, Sudhir Kumar
    Prasad, Srinivas
    Bebarta, Dwiti Krishna
    Das, Tapan Kumar
    Srinivasan, Kathiravan
    Hu, Yuh-Chung
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [29] Social Media Hate Speech Detection Using Machine Learning Algorithms: Comparative Study
    Dharani, P.
    Bagade, Nidhi
    Nittala, Sripriya
    Konkala, Sowmya
    Sasidhar, B.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 864 - 870
  • [30] Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning
    Alkadri, Abdullah M.
    Elkorany, Abeer
    Ahmed, Cherry
    APPLIED SCIENCES-BASEL, 2022, 12 (22):