Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引:0
|
作者
Adeeba F. [1 ]
Yousuf M.I. [1 ]
Anwer I. [2 ]
Tariq S.U. [1 ]
Ashfaq A. [1 ]
Naqeeb M. [1 ]
机构
[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore
[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore
关键词
Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;
D O I
10.7717/PEERJ-CS.1963
中图分类号
学科分类号
摘要
The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.
引用
收藏
相关论文
共 50 条
  • [31] Emotion Detection from Tweets using AIT-2018 Dataset
    Shah, Faisal Muhammad
    Reyadh, Abdus Sayef
    Shaafi, Asif Imtiaz
    Ahmed, Sifat
    Sithil, Fatima Tabsun
    2019 5TH INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE), 2019, : 575 - 580
  • [32] Instagram-Based Benchmark Dataset for Cyberbullying Detection in Arabic Text
    ALBayari, Reem
    Abdallah, Sherief
    DATA, 2022, 7 (07)
  • [33] Urdu text in natural scene images: a new dataset and preliminary text detection
    Ali H.
    Iqbal K.
    Mujtaba G.
    Fayyaz A.
    Bulbul M.F.
    Karam F.W.
    Zahir A.
    PeerJ Computer Science, 2021, 7 : 1 - 17
  • [34] Urdu text in natural scene images: a new dataset and preliminary text detection
    Ali, Hazrat
    Iqbal, Khalid
    Mujtaba, Ghulam
    Fayyaz, Ahmad
    Bulbul, Mohammad Farhad
    Karam, Fazal Wahab
    Zahir, Ali
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [35] PaveDistress: A comprehensive dataset of pavement distresses detection
    Liu, Zhen
    Wu, Wenxiu
    Gu, Xingyu
    Cui, Bingyan
    DATA IN BRIEF, 2024, 57
  • [36] UHCTD: A Comprehensive Dataset for Camera Tampering Detection
    Mantini, Pranav
    Shah, Shishir K.
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [37] A versatile dataset for intrinsic plagiarism detection, text reuse analysis, and author clustering in Urdu
    Haseeb, Muhammad
    Manzoor, Muhammad Faraz
    Farooq, Muhammad Shoaib
    Farooq, Uzma
    Abid, Adnan
    DATA IN BRIEF, 2024, 52
  • [38] "Bend the truth": Benchmark dataset for fake news detection in Urdu language and its evaluation
    Amjad, Maaz
    Sidorov, Grigori
    Zhila, Alisa
    Gomez-Adorno, Helena
    Voronkov, Ilia
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2457 - 2469
  • [39] Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering
    Maryam, Hiba
    Fu, Ling
    Song, Jiajun
    Shafayet, Tajrian A. B. M.
    Luo, Qidi
    Bai, Xiang
    Liu, Yuliang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT V, 2024, 14808 : 279 - 292
  • [40] The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection
    Bergmann, Paul
    Batzner, Kilian
    Fauser, Michael
    Sattlegger, David
    Steger, Carsten
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (04) : 1038 - 1059