Automatic detection of hate speech in code-mixed Indian languages in twitter social media interaction using DConvBLSTM-MuRIL ensemble method

被引:1
|
作者
Kakati, Pallabi [1 ]
Dandotiya, Devendra [2 ,3 ]
机构
[1] Presidency Univ, Dept Elect & Commun Engn, Bangalore 560064, Karnataka, India
[2] Presidency Univ, Dept Mech Engn, Bangalore 560064, Karnataka, India
[3] Presidency Univ, Innovat & Translat Res Hub iTRH, Bangalore 560064, Karnataka, India
关键词
Hate speech detection; Multiclass classification; DConvBLSTM; MuRIL; Ensemble;
D O I
10.1007/s13278-024-01264-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social media platforms have gained immense popularity in recent years and are used for various activities such as marketing, news-sharing, and celebrating achievements. However, they are also notorious for spreading hateful and discriminatory content, which can cause harm to individuals and communities. Therefore, it is crucial to detect and remove such content from social media platforms as soon as possible. Although research related to the detection of hate speech and inflammatory content is increasing, studies focused on code-mixed Indian languages are limited. Hence, in this work, we have conducted a comprehensive study, where we have compared the effectiveness of various neural networks, and transformer-based techniques for the detection of hate and objectionable language in social media tweets in Hinglish, Tamil written in English, and Malayalam written in English, to propose the best-performing ensemble model, named as DConvBLSTM-MuRIL. To carry out our experiments, we have created our datasets for the three languages under study and compared the results with already existing datasets. Our proposed weighted ensemble framework outperformed the existing models, achieving better-weighted F1-scores and better accuracy for all the three languages under consideration.
引用
收藏
页数:22
相关论文
共 16 条
  • [1] Hate Speech is not Free Speech: Explainable Machine Learning for Hate Speech Detection in Code-Mixed Languages
    Yadav, Sargam
    Kaushik, Abhishek
    McDaid, Kevin
    2023 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGY AND SOCIETY, ISTAS, 2023,
  • [2] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [3] Ngalawan Ujaran Sengit: hate speech detection in indonesian code-mixed social media data
    Pamungkas, Endang Wahyu
    Chiril, Patricia
    LANGUAGE RESOURCES AND EVALUATION, 2025,
  • [4] ENHANCED DETECTION OF HATE SPEECH IN DRAVIDIAN LANGUAGES IN SOCIAL MEDIA USING ENSEMBLE TRANSFORMERS
    Arunachalam, V.
    Maheswari, N.
    Interdisciplinary Journal of Information, Knowledge, and Management, 2024, 19
  • [5] Ensuring safety in digital spaces: Detecting code-mixed hate speech in social media posts
    Roy, Pradeep Kumar
    Kumar, Abhinav
    DATA & KNOWLEDGE ENGINEERING, 2025, 156
  • [6] Code-Mixed Sentiment Analysis using Transformer for Twitter Social Media Data
    Astuti, Laksmita Widya
    Sari, Yunita
    Suprapto
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 498 - 504
  • [7] Exploring the Impact of Lexicon-based Knowledge Transfer for Hate Speech Detection in Indonesia Code-Mixed Languages
    Pamungkas, Endang Wahyu
    Purworini, Dian
    Priyawati, Diah
    Chasana, Rona Rizhky Bunga
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 85 - 90
  • [8] Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
    Mohapatra, Sudhir Kumar
    Prasad, Srinivas
    Bebarta, Dwiti Krishna
    Das, Tapan Kumar
    Srinivasan, Kathiravan
    Hu, Yuh-Chung
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [9] Sentiment Analysis of Code-Mixed Social Media Text (SA-CMSMT) in Indian-Languages
    Ahmad, Gazi Imtiyaz
    Singla, Jimmy
    2021 INTERNATIONAL CONFERENCE ON COMPUTING SCIENCES (ICCS 2021), 2021, : 25 - 33
  • [10] Abusive Comment Detection from Bengali-English Code-Mixed Social Media Texts Using Ensemble of Deep Learning
    Fahim, Iftekhar
    Ahsan, Shawly
    Hoque, Mohammed Moshiul
    ARTIFICIAL INTELLIGENCE AND KNOWLEDGE PROCESSING, AIKP 2024, 2025, 2228 : 252 - 267