Cyberbullying Detection using BERT for Telugu Language

被引:0
|
作者
Talasila, Sri Lakshmi [1 ]
Kothuri, Dharani Priya [1 ]
Manchiraju, Savithri Jahnavi [1 ]
Mallavalli, Mutyala Sai Sasank [1 ]
Dande, Lourdu Gnana Harshith [1 ]
机构
[1] Prasad V Potluri Siddhartha Inst Technol, Comp Sci & Engn, Vijayawada, India
关键词
Cyberbullying; Telugu; Bidirectional Encoder Representations from Transformers (BERT); Bullying Preprocessing; Harassment; Language; Social Media;
D O I
10.1109/ICPCSN62568.2024.00077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid proliferation of online communication has introduced cyberbullying as a significant concern affecting individuals' well-being. Existing research employs various techniques like Tf-Idf, XLM-RoBERTa, and machine learning algorithms such as Logistic Regression, Random Forest, and Naive Bayes to detect cyberbullying across mixed and bilingual languages. However, these approaches often struggle with accuracy and fail to effectively discern cyberbullying instances due to language nuances and context misinterpretation. Key challenges faced by previous systems include limited linguistic coverage, contextual understanding, and nuanced interpretation of cyberbullying. The new advancement to address these challenges is the implementation of BERT (Bidirectional Encoder Representations from Transformers) architecture by leveraging bidirectional context understanding, allowing it to capture subtle linguistic nuances and contextual cues, thereby improving accuracy and contextual understanding. The proposed model is advancing further by integrating specialized models like IndicBERT, specifically tailored for languages like Telugu. By focusing on contextual nuances, our model aims to improve precision and accuracy of cyberbullying detection for a local language, Telugu content. This study has developed a local language, Telugu dataset comprising 27,000 sentences and achieve an accuracy rate of 90%, highlighting the efficacy of our approach in overcoming these challenges and contributing to online safety.
引用
收藏
页码:454 / 461
页数:8
相关论文
共 50 条
  • [41] Offensive Hebrew Corpus and Detection using BERT
    Hamad, Nagham
    Jarrar, Mustafa
    Khalilia, Mohammad
    Nashif, Nadim
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [42] Malware Detection and Classification Using fastText and BERT
    Yesir, Salih
    Sogukpinar, Ibrahim
    9TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS'21), 2021,
  • [43] Presumptive Detection of Cyberbullying on Twitter through Natural Language Processing and Machine Learning in the Spanish Language
    Leon-Paredes, Gabriel A.
    Palomeque-Leon, Wilson F.
    Gallegos-Segovia, Pablo L.
    Vintimilla-Tapia, Paul E.
    Bravo-Torres, Jack F.
    Barbosa-Santillan, Liliana, I
    Paredes-Pinos, Maria M.
    2019 IEEE CHILEAN CONFERENCE ON ELECTRICAL, ELECTRONICS ENGINEERING, INFORMATION AND COMMUNICATION TECHNOLOGIES (CHILECON), 2019,
  • [44] VulDeBERT: A Vulnerability Detection System Using BERT
    Kim, Soolin
    Choi, Jusop
    Ahmed, Muhammad Ejaz
    Nepal, Surya
    Kim, Hyoungshick
    2022 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2022), 2022, : 69 - 74
  • [45] Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT
    Haq, Ijazul
    Qiu, Weidong
    Guo, Jie
    Tang, Peng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [46] Leveraging Large Language Models and BERT for Log Parsing and Anomaly Detection
    Zhou, Yihan
    Chen, Yan
    Rao, Xuanming
    Zhou, Yukang
    Li, Yuxin
    Hu, Chao
    MATHEMATICS, 2024, 12 (17)
  • [47] Language Code-Switching Detection Based on BERT-LID
    Nie, Yuting
    Zhang, WeiQiang
    Ji, Zhe
    Shi, GuiXin
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 36 - 40
  • [48] Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT
    Haq I.
    Qiu W.
    Guo J.
    Tang P.
    PeerJ Computer Science, 2023, 9 : 1 - 26
  • [49] CAN-BERT do it? Controller Area Network Intrusion Detection System based on BERT Language Model
    Alkhatib, Natasha
    Mushtaq, Maria
    Ghauch, Hadi
    Danger, Jean-Luc
    2022 IEEE/ACS 19TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2022,
  • [50] Identification of regional dialects of Telugu language using text independent speech processing models
    S. Shivaprasad
    M. Sadanandam
    International Journal of Speech Technology, 2020, 23 : 251 - 258