Detection of Arabic offensive language in social media using machine learning models

被引:2
|
作者
Mousa, Aya [1 ]
Shahin, Ismail [1 ]
Nassif, Ali Bou [2 ]
Elnagar, Ashraf [3 ]
机构
[1] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates
[2] Univ Sharjah, Dept Comp Engn, Sharjah, U Arab Emirates
[3] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
来源
关键词
Arabic text classification; Cascaded model; Machine learning; Multiclass detection; Offensive language;
D O I
10.1016/j.iswa.2024.200376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research aims to detect different types of Arabic offensive language in twitter. It uses a multiclass classification system in which each tweet is categorized into one or more of the offensive language types based on the used word(s). In this study, five types are classified, which are: bullying, insult, racism, obscene, and nonoffensive. To classify the abusive language, a cascaded model consisting of Bidirectional Encoder Representation of Transformers (BERT) models (AraBERT, ArabicBERT, XLMRoBERTa, GigaBERT, MBERT, and QARiB), deep learning models (1D-CNN, BiLSTM), and Radial Basis Function (RBF) is presented in this work. In addition, various types of machine learning models are utilized. The dataset is collected from twitter in which each class has the same number of tweets (balanced dataset). Each tweet is assigned to one or more of the selected offensive language types to build multiclass and multilabel systems. In addition, a binary dataset is constructed by assigning the tweets to offensive or non-offensive classes. The highest results are obtained from implementing the cascaded model started by ArabicBERT followed by BiLSTM and RBF with an accuracy, precision, recall, and F1score of 98.4%, 98.2%,92.8%, and 98.4%, respectively. RBF records the highest results among the utilized traditional classifiers with an accuracy, precision, recall, and F1-score of 60% for each measurement individually, while KNN records the lowest results obtaining 45%, 46%, 45%, and 43% in terms of accuracy, precision, recall, and F1-score, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Detection and Prediction of Future Mental Disorder From Social Media Data Using Machine Learning, Ensemble Learning, and Large Language Models
    Abdullah, Mohammed
    Negied, Nermin
    IEEE ACCESS, 2024, 12 : 120553 - 120569
  • [22] Detection of offensive terms in resource-poor language using machine learning algorithms
    Raza, Muhammad Owais
    Mahoto, Naeem Ahmed
    Hamdi, Mohammed
    Al Reshan, Mana Saleh
    Rajab, Adel
    Shaikh, Asadullah
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [23] Digitalization Of Arabic Language Learning Through Social Media
    Muhlis, Wachida
    Rahmayanti, Indah
    Fitrah, Isti Jayang
    Muhamad, Sahrul
    Hattab, Muhammad
    IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2024, 7 (01): : 157 - 169
  • [24] Detection of Hateful Social Media Content for Arabic Language
    Al-Ibrahim, Rogayah M.
    Ali, Mostafa Z.
    Najadat, Hassan M.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [25] Social Media Cyberbullying Detection using Machine Learning
    Hani, John
    Nashaat, Mohamed
    Ahmed, Mostafa
    Emad, Zeyad
    Amer, Eslam
    Mohammed, Ammar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 703 - 707
  • [26] Offensive Language Recognition in Social Media
    Shushkevich, Elena
    Cardiff, John
    Rosso, Paolo
    Akhtyamova, Liliya
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 523 - 532
  • [27] An Exploration of Machine Learning and Deep Learning Techniques for Offensive Text Detection in Social Media-A Systematic Review
    Sharma, Geetanjali
    Brar, Gursimran Singh
    Singh, Pahuldeep
    Gupta, Nitish
    Kalra, Nidhi
    Parashar, Anshu
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 3, 2023, 492 : 541 - 559
  • [28] Machine learning approach for threat detection on social media posts containing Arabic text
    Shatha AbdulAziz AlAjlan
    Abdul Khader Jilani Saudagar
    Evolutionary Intelligence, 2021, 14 : 811 - 822
  • [29] Machine learning approach for threat detection on social media posts containing Arabic text
    AlAjlan, Shatha AbdulAziz
    Saudagar, Abdul Khader Jilani
    EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) : 811 - 822
  • [30] A Review of Natural Language Processing and Machine Learning Tools Used to Analyze Arabic Social Media
    Kanan, Tarek
    Sadaqa, Odai
    Aldajeh, Amal
    Alshwabka, Hanadi
    AL-dolime, Wassan
    AlZu'bi, Shadi
    Elbes, Mohammed
    Hawashin, Bilal
    Alia, Mohammad A.
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 622 - 628