Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引:0
|
作者
Adeeba F. [1 ]
Yousuf M.I. [1 ]
Anwer I. [2 ]
Tariq S.U. [1 ]
Ashfaq A. [1 ]
Naqeeb M. [1 ]
机构
[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore
[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore
关键词
Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;
D O I
10.7717/PEERJ-CS.1963
中图分类号
学科分类号
摘要
The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.
引用
收藏
相关论文
共 50 条
  • [21] Cyberbullying Detection in Code-Mixed Languages: Dataset and Techniques
    Maity, Krishanu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1692 - 1698
  • [22] Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data
    Amirita Dewani
    Mohsin Ali Memon
    Sania Bhatti
    Journal of Big Data, 8
  • [23] Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data
    Dewani, Amirita
    Memon, Mohsin Ali
    Bhatti, Sania
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [24] A Dataset for Investigating the Impact of Context for Offensive Language Detection in Tweets
    Ihtiyar, Musa Nuri
    Ozdemir, Omer
    Erengul, Mustafa Emre
    Ozgur, Arzucan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1543 - 1549
  • [25] Multilingual Cyberbullying Detection System
    Pawar, Rohit
    Raje, Rajeev R.
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 40 - 44
  • [26] Cyberbullying System Detection and Analysis
    Foong, Yee Jang
    Oussalah, Mourad
    2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC), 2017, : 40 - 46
  • [27] Detection of Sarcasm in Urdu Tweets Using Deep Learning and Transformer Based Hybrid Approaches
    Hassan, Muhammad Ehtisham
    Hussain, Masroor
    Maab, Iffat
    Habib, Usman
    Khan, Muhammad Attique
    Masood, Anum
    IEEE ACCESS, 2024, 12 : 61542 - 61555
  • [28] Multilingual Cyberbullying Detection System Detecting Cyberbullying in Arabic Content
    Haidar, Batoul
    Chamoun, Maroun
    Serhrouchni, Ahmed
    2017 1ST CYBER SECURITY IN NETWORKING CONFERENCE (CSNET), 2017,
  • [29] Cyberbullying Detection by Sentiment Analysis of Tweets' Contents Written in Arabic in Saudi Arabia Society
    Almutairi, Amjad Rasmi
    Al-Hagery, Muhammad Abdullah
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 112 - 119
  • [30] DEVELOPMENT OF COMPUTATIONAL LINGUISTIC RESOURCES FOR AUTOMATED DETECTION OF TEXTUAL CYBERBULLYING THREATS IN ROMAN URDU LANGUAGE
    Dewani, Amirita
    Memon, Mohsin Ali
    Bhatti, Sania
    3C TIC, 2021, 10 (02): : 101 - 121