Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引:0
|
作者
Adeeba F. [1 ]
Yousuf M.I. [1 ]
Anwer I. [2 ]
Tariq S.U. [1 ]
Ashfaq A. [1 ]
Naqeeb M. [1 ]
机构
[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore
[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore
关键词
Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;
D O I
10.7717/PEERJ-CS.1963
中图分类号
学科分类号
摘要
The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.
引用
收藏
相关论文
共 50 条
  • [1] Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system
    Adeeba, Farah
    Yousuf, Muhammad Irfan
    Anwer, Izza
    Tariq, Sardar Umair
    Ashfaq, Abdullah
    Naqeeb, Malik
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] Automatic Abusive Language Detection in Urdu Tweets
    Amjad, Maaz
    Ashraf, Noman
    Sidorov, Grigori
    Zhila, Alisa
    Chanona-Hernandez, Liliana
    Gelbukh, Alexander
    ACTA POLYTECHNICA HUNGARICA, 2022, 19 (10) : 143 - 163
  • [3] Threatening Language Detection and Target Identification in Urdu Tweets
    Amjad, Maaz
    Ashraf, Noman
    Zhila, Alisa
    Sidorov, Grigori
    Zubiaga, Arkaitz
    Gelbukh, Alexander
    IEEE ACCESS, 2021, 9 (09): : 128302 - 128313
  • [4] Cyberbullying Detection for Urdu Language Using Machine Learning
    Mustafa, Hamza
    Zafar, Kashif
    FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 1, FONES-AIOT 2024, 2024, 1035 : 244 - 257
  • [5] Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English Social Media Conversations
    Razi, Fakhra
    Ejaz, Naveed
    IEEE ACCESS, 2024, 12 : 105201 - 105210
  • [6] Cyberbullying detection from tweets using deep learning
    Bharti, Shubham
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    KYBERNETES, 2022, 51 (09) : 2695 - 2711
  • [7] A Machine Learning Approach to Cyberbullying Detection in Arabic Tweets
    Musleh, Dhiaa
    Rahman, Atta
    Alkherallah, Mohammed Abbas
    Al-Bohassan, Menhal Kamel
    Alawami, Mustafa Mohammed
    Alsebaa, Hayder Ali
    Alnemer, Jawad Ali
    Al-Mutairi, Ghazi Fayez
    Aldossary, May Issa
    Aldowaihi, Dalal A.
    Alhaidari, Fahd
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1033 - 1054
  • [8] Assessing Urdu Language Processing Tools via Statistical and Outlier Detection Methods on Urdu Tweets
    Zoya
    Latif, Seemab
    Latif, Rabia
    Majeed, Hammad
    Jamail, Nor Shahida Mohd
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [9] Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
    Harris, Sheetal
    Liu, Jinshuo
    Hadi, Hassan Jalil
    Cao, Yue
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2440 - 2447
  • [10] Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
    Harris, Sheetal
    Liu, Jinshuo
    Hadi, Hassan Jalil
    Cao, Yue
    arXiv, 1600,