Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system

被引：0

作者：

Adeeba F. ^{[1
]}

Yousuf M.I. ^{[1
]}

Anwer I. ^{[2
]}

Tariq S.U. ^{[1
]}

Ashfaq A. ^{[1
]}

Naqeeb M. ^{[1
]}

机构：

[1] Department of Computer Science, University of Engineering and Technology Lahore, Punjab, Lahore

[2] Department of Transportation Engineering and Management, University of Engineering and Technology Lahore, Punjab, Lahore

来源：

PeerJ Computer Science | 2024年 / 10卷

关键词：

Artificial Intelligence; Cyberbullying annotation guidelines; Natural Language and Speech; Network Science and Online Social Networks; Sentiment Analysis; Text Mining; Urdu cyberbullying detection; Urdu sentiment analysis; Urdu tweets dataset;

D O I：

10.7717/PEERJ-CS.1963

中图分类号：

学科分类号：

摘要：

The prevalence of cyberbullying has reached an alarming rate, affecting approximately 54% of teenagers who experience various forms of cyberbullying, including offensive hate speech, threats, and racism. This research introduces a comprehensive dataset and system for cyberbullying detection in Urdu tweets, leveraging a spectrum of machine learning approaches including traditional models and advanced deep learning techniques. The objectives of this study are threefold. Firstly, a dataset consisting of 12,500 annotated tweets in Urdu is created, and it is made publicly available to the research community. Secondly, annotation guidelines for Urdu text with appropriate labels for cyberbullying detection are developed. Finally, a series of experiments is conducted to assess the performance of machine learning and deep learning techniques in detecting cyberbullying. The results indicate that fastText deep learning models outperform other models in cyberbullying detection. This study demonstrates its efficacy in effectively detecting and classifying cyberbullying incidents in Urdu tweets, contributing to the broader effort of creating a safer digital environment. © 2024 Adeeba et al. Distributed under Creative Commons CC-BY 4.0. All Rights Reserved.

引用

共 50 条

[1] Addressing cyberbullying in Urdu tweets: a comprehensive dataset and detection system
Adeeba, Farah
Yousuf, Muhammad Irfan
Anwer, Izza
Tariq, Sardar Umair
Ashfaq, Abdullah
Naqeeb, Malik
PEERJ COMPUTER SCIENCE, 2024, 10
[2] Automatic Abusive Language Detection in Urdu Tweets
Amjad, Maaz
Ashraf, Noman
Sidorov, Grigori
Zhila, Alisa
Chanona-Hernandez, Liliana
Gelbukh, Alexander
ACTA POLYTECHNICA HUNGARICA, 2022, 19 (10) : 143 - 163
[3] Threatening Language Detection and Target Identification in Urdu Tweets
Amjad, Maaz
Ashraf, Noman
Zhila, Alisa
Sidorov, Grigori
Zubiaga, Arkaitz
Gelbukh, Alexander
IEEE ACCESS, 2021, 9 (09): : 128302 - 128313
[4] Cyberbullying Detection for Urdu Language Using Machine Learning
Mustafa, Hamza
Zafar, Kashif
FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 1, FONES-AIOT 2024, 2024, 1035 : 244 - 257
[5] Multilingual Detection of Cyberbullying in Mixed Urdu, Roman Urdu, and English Social Media Conversations
Razi, Fakhra
Ejaz, Naveed
IEEE ACCESS, 2024, 12 : 105201 - 105210
[6] Cyberbullying detection from tweets using deep learning
Bharti, Shubham
Yadav, Arun Kumar
Kumar, Mohit
Yadav, Divakar
KYBERNETES, 2022, 51 (09) : 2695 - 2711
[7] A Machine Learning Approach to Cyberbullying Detection in Arabic Tweets
Musleh, Dhiaa
Rahman, Atta
Alkherallah, Mohammed Abbas
Al-Bohassan, Menhal Kamel
Alawami, Mustafa Mohammed
Alsebaa, Hayder Ali
Alnemer, Jawad Ali
Al-Mutairi, Ghazi Fayez
Aldossary, May Issa
Aldowaihi, Dalal A.
Alhaidari, Fahd
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1033 - 1054
[8] Assessing Urdu Language Processing Tools via Statistical and Outlier Detection Methods on Urdu Tweets
Zoya
Latif, Seemab
Latif, Rabia
Majeed, Hammad
Jamail, Nor Shahida Mohd
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
[9] Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Harris, Sheetal
Liu, Jinshuo
Hadi, Hassan Jalil
Cao, Yue
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2440 - 2447
[10] Ax-to-Grind Urdu: Benchmark Dataset for Urdu Fake News Detection
Harris, Sheetal
Liu, Jinshuo
Hadi, Hassan Jalil
Cao, Yue
arXiv, 1600,

← 1 2 3 4 5 →