Enhancing Multilingual Hate Speech Detection: From Language-Specific Insights to Cross-Linguistic Integration

被引:3
|
作者
Hashmi, Ehtesham [1 ]
Yildirim Yayilgan, Sule [1 ]
Hameed, Ibrahim A. [1 ]
Mudassar Yamin, Muhammad [1 ]
Ullah, Mohib [2 ]
Abomhara, Mohamed [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept ICT & Nat Sci IIR, N-6009 Alesund, More og Romsdal, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, N-2815 Gjovik, Norway
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Hate speech; Artificial intelligence; Linguistics; Adaptation models; Machine learning; Deep learning; Natural language processing; Explainable AI; Social networking (online); Multi lingual; word embedding; machine learning; deep learning; transformers; natural language processing; explainable AI; GPT;
D O I
10.1109/ACCESS.2024.3452987
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rise of social media has enabled individuals with biased perspectives to spread hate speech, directing it toward individuals based on characteristics such as race, gender, religion, or sexual orientation. Constructive interactions in varied communities can greatly enhance self-esteem, yet it is vital to consider that adverse comments may affect individuals' social standing and emotional health. The crucial task of detecting and addressing this type of content is imperative for reducing its negative effects on communities and individuals alike. The rising occurrence highlights the urgency for enhanced methods and robust regulations on digital platforms to protect humans from such prejudicial and damaging conduct. Hate speech typically appears as a deliberate hostile action aimed at a particular group, often with the intent to demean or isolate them based on various facets of their identity. Research on hate speech predominantly targets resource-aware languages like English, German, and Chinese. Conversely, resource-limited languages, including European languages such as Italian, Spanish, and Portuguese, alongside Asian languages like Roman Urdu, Korean, and Indonesian, present obstacles. These challenges arise from a lack of linguistic resources, making the extraction of information a more strenuous task. This study is focused on the detection and improvement of multilingual hate speech detection across 13 different languages. To conduct a thorough analysis, we carried out a series of experiments that ranged from classical machine learning techniques and mainstream deep learning approaches to recent transformer-based methods. Through hyperparameter tuning, optimization techniques, and generative configurations, we achieved robust and generalized performance capable of effectively identifying hate speech across various dialects. Specifically, we achieved a notable enhancement in detection performance, with precision and recall metrics exceeding baseline models by up to 10% across several lesser-studied languages. Additionally, our work extends the capabilities of explainable AI within this context, offering deeper insights into model decisions, which is crucial for regulatory and ethical considerations in AI deployment. Our study presents substantial performance improvements across various datasets and languages through meticulous comparisons. For example, our model significantly outperformed existing benchmarks: it achieved F1-scores of 0.90 in German (GermEval-2018), up from the baseline score of 0.72, and 0.93 in German (GermEval-2021), a substantial increase from 0.58. Additionally, it scored 0.95 in Roman Urdu HS, surpassing the previous peak of 0.91. Furthermore, for mixed-language datasets such as Italian and English (AMI 2018), our accuracy rose dramatically from 0.59 to 0.96. These outcomes emphasize the robustness and versatility of our model, establishing a new standard for hate speech detection systems across diverse linguistic settings.
引用
收藏
页码:121507 / 121537
页数:31
相关论文
共 50 条
  • [41] Enhancing cross-lingual hate speech detection through contrastive and adversarial learning
    Almahdi, Asseel Jabbar
    Mohades, Ali
    Akbari, Mohammad
    Heidary, Soroush
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 147
  • [42] Metalinguist: enhancing hate speech detection with cross-lingual meta-learning
    Hashmi, Ehtesham
    Yayilgan, Sule Yildirim
    Abomhara, Mohamed
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [43] Cross-linguistic influence from stronger to weaker language in unbalanced bilingual development
    Kupisch, Tanja
    PROCEEDINGS OF THE 32ND ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 2008, : 251 - 262
  • [44] A systematic review of the quantitative markers of speech and language of the frontotemporal degeneration spectrum and their potential for cross-linguistic implementation
    Coppieters, Rosie
    Bouzigues, Arabella
    Jiskoot, Lize
    Montembeault, Maxime
    Tee, Boon Lead
    Rohrer, Jonathan D.
    Bruffaerts, Rose
    NEUROSCIENCE AND BIOBEHAVIORAL REVIEWS, 2024, 167
  • [45] Bilingual Speech Sound Development During the Preschool Years: The Role of Language Proficiency and Cross-Linguistic Relatedness
    Montanari, Simona
    Mayr, Robert
    Subrahmanyam, Kaveri
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (10): : 2467 - 2486
  • [46] Is cross-linguistic advert flaw detection in Wikipedia feasible? A multilingual-BERT-based transfer learning approach
    Li, Muyan
    Zhou, Heshen
    Hou, Jingrui
    Wang, Ping
    Gao, Erpei
    KNOWLEDGE-BASED SYSTEMS, 2022, 252
  • [47] Reconceptualising the notion of cross-linguistic transfer in multilingual spaces: A Global South perspective from South Africa
    Mkhize, Dumisile N.
    LANGUAGE SCIENCES, 2023, 100
  • [48] Cross-linguistic influence at the discourse-syntax interface: Insights from anaphora resolution in child second language learners of Italian
    Kras, Tihana
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2016, 20 (04) : 369 - 385
  • [49] Cross-linguistic relations between quantifiers and numerals in language acquisition: Evidence from Japanese
    Barner, David
    Libenson, Amanda
    Cheung, Pierina
    Takasaki, Mayu
    JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY, 2009, 103 (04) : 421 - 440
  • [50] Cross-Linguistic Cognate Production in Spanish-English Bilingual Children With and Without Specific Language Impairment
    Grasso, Stephanie M.
    Pena, Elizabeth D.
    Bedore, Lisa M.
    Hixon, J. Gregory
    Griffin, Zenzi M.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (03): : 619 - 633