Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引:0
|
作者
Adeeba F. [1 ]
Yousuf M.I. [1 ]
Anwer I. [2 ]
Tariq S.U. [1 ]
Ashfaq A. [1 ]
Naqeeb M. [1 ]
机构
[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore
[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore
关键词
Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;
D O I
10.7717/PEERJ-CS.1963
中图分类号
学科分类号
摘要
The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.
引用
收藏
相关论文
共 50 条
  • [11] Towards comprehensive cyberbullying detection: A dataset incorporating aggressive texts, repetition, peerness, and intent to harm
    Ejaz, Naveed
    Razi, Fakhra
    Choudhury, Salimur
    COMPUTERS IN HUMAN BEHAVIOR, 2024, 153
  • [12] Annotation System to Build Cyberbullying and Hate Speech Detection Model Training Dataset
    Febriana, Trisna
    Budiarto, Arif
    CHIUXID 2020: 6TH INTERNATIONAL ACM IN-COOPERATION HCI AND UX CONFERENCE, 2020, : 29 - 30
  • [13] Improving Hate Speech Detection of Urdu Tweets Using Sentiment Analysis
    Ali, Muhammad Z.
    Ehsan-Ul-Haq
    Rauf, Sahar
    Javed, Kashif
    Hussain, Sarmad
    IEEE ACCESS, 2021, 9 : 84296 - 84305
  • [14] Cyberbullying Detection and Abuser Profile Identification on Social Media for Roman Urdu
    Atif, Ayesha
    Zafar, Amna
    Wasim, Muhammad
    Waheed, Talha
    Ali, Amjad
    Ali, Hazrat
    Shah, Zubair
    IEEE ACCESS, 2024, 12 : 123339 - 123351
  • [15] Threatening URDU Language Detection from Tweets Using Machine Learning
    Mehmood, Aneela
    Farooq, Muhammad Shoaib
    Naseem, Ansar
    Rustam, Furqan
    Gracia Villar, Monica
    Lili Rodriguez, Carmen
    Ashraf, Imran
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [16] Policy-Based Spam Detection of Tweets Dataset
    Dar, Momna
    Iqbal, Faiza
    Latif, Rabia
    Altaf, Ayesha
    Jamail, Nor Shahida Mohd
    ELECTRONICS, 2023, 12 (12)
  • [17] Detection of violence incitation expressions in Urdu tweets using convolutional neural network
    Khan, Muhammad Shahid
    Malik, Muhammad Shahid Iqbal
    Nadeem, Aamer
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [18] Automatic detection of cyberbullying and threatening in Saudi tweets using machine learning
    Alghamdi, Deema
    Al-Motery, Rahaf
    Alma'abdi, Reem
    Alzamzami, Ohoud
    Babour, Amal
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2021, 8 (10): : 17 - 25
  • [19] Harnessing English Sentiment Lexicons for Polarity Detection in Urdu Tweets: A Baseline Approach
    Khan, Muhammad Yaseen
    Emaduddin, Shah Muhammad
    Junejo, Khurum Nazir
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2017, : 242 - 249
  • [20] UAlpha40: A comprehensive dataset of Urdu alphabet for Pakistan sign language
    Javaid, Sameena
    Sajid, Shahood
    Baloch, Yusra Khan
    DATA IN BRIEF, 2025, 59