Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引:0
|
作者
Adeeba F. [1 ]
Yousuf M.I. [1 ]
Anwer I. [2 ]
Tariq S.U. [1 ]
Ashfaq A. [1 ]
Naqeeb M. [1 ]
机构
[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore
[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore
关键词
Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;
D O I
10.7717/PEERJ-CS.1963
中图分类号
学科分类号
摘要
The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.
引用
收藏
相关论文
共 50 条
  • [41] The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
    Paul Bergmann
    Kilian Batzner
    Michael Fauser
    David Sattlegger
    Carsten Steger
    International Journal of Computer Vision, 2021, 129 : 1038 - 1059
  • [42] BulliShield: A Smart Cyberbullying Detection and Reporting System
    Tahmid, Farhan Ishrak
    Akbar, Farhana
    Rahman, Ahsanur
    PROCEEDINGS 2024 SEVENTH INTERNATIONAL WOMEN IN DATA SCIENCE CONFERENCE AT PRINCE SULTAN UNIVERSITY, WIDS-PSU 2024, 2024, : 198 - 203
  • [43] Cyberbullying Detection System with Multiple Server Configurations
    Pawar, Rohit
    Agrawal, Yash
    Joshi, Akshay
    Gorrepati, Ranadheer
    Raje, Rajeev R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2018, : 90 - 95
  • [44] Cyberbullying detection system focusing on the isiXhosa language
    Matomela, Vuyokazi
    Henney, Andre J.
    2022 CONFERENCE ON INFORMATION COMMUNICATIONS TECHNOLOGY AND SOCIETY (ICTAS), 2022, : 93 - 98
  • [45] A computer vision-based system for recognition and classification of Urdu sign language dataset
    Zahid, Hira
    Rashid, Munaf
    Syed, Sidra Abid
    Ullah, Rafi
    Asif, Muhammad
    Khan, Muzammil
    Mujeeb, Amenah Abdul
    Khan, Ali Haider
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [46] A computer vision-based system for recognition and classification of Urdu sign language dataset
    Zahid H.
    Rashid M.
    Syed S.A.
    Ullah R.
    Asif M.
    Khan M.
    Mujeeb A.A.
    Khan A.H.
    PeerJ Computer Science, 2022, 8
  • [47] Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images
    Chandio, Asghar Ali
    Asikuzzamana, Md.
    Pickering, Mark
    Leghari, Mehwish
    DATA IN BRIEF, 2020, 31
  • [48] VID: A comprehensive dataset for violence detection in various contexts
    Mahi, Abu Bakar Siddique
    Eshita, Farhana Sultana
    Chowdhury, Tabassum
    Rahman, Rashik
    Helaly, Tanjina
    DATA IN BRIEF, 2024, 57
  • [49] Cyberbullying Detection Based on Hybrid Ensemble Method using Deep Learning Technique in Bangla Dataset
    Ahmed, Md. Tofael
    Urmi, Afroza Sharmin
    Rahman, Maqsudur
    Islam, Abu Zafor Muhammad Touhidul
    Das, Dipankar
    Rashed, Md. Golam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 545 - 551
  • [50] HVAC system attack detection dataset
    Elnour, Mariam
    Meskin, Nader
    Khan, Khaled
    Jain, Raj
    DATA IN BRIEF, 2021, 37