Enhancing Multilingual Hate Speech Detection: From Language-Specific Insights to Cross-Linguistic Integration

被引:3
|
作者
Hashmi, Ehtesham [1 ]
Yildirim Yayilgan, Sule [1 ]
Hameed, Ibrahim A. [1 ]
Mudassar Yamin, Muhammad [1 ]
Ullah, Mohib [2 ]
Abomhara, Mohamed [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept ICT & Nat Sci IIR, N-6009 Alesund, More og Romsdal, Norway
[2] Norwegian Univ Sci & Technol NTNU, Dept Informat Secur & Commun Technol IIK, N-2815 Gjovik, Norway
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Transformers; Hate speech; Artificial intelligence; Linguistics; Adaptation models; Machine learning; Deep learning; Natural language processing; Explainable AI; Social networking (online); Multi lingual; word embedding; machine learning; deep learning; transformers; natural language processing; explainable AI; GPT;
D O I
10.1109/ACCESS.2024.3452987
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rise of social media has enabled individuals with biased perspectives to spread hate speech, directing it toward individuals based on characteristics such as race, gender, religion, or sexual orientation. Constructive interactions in varied communities can greatly enhance self-esteem, yet it is vital to consider that adverse comments may affect individuals' social standing and emotional health. The crucial task of detecting and addressing this type of content is imperative for reducing its negative effects on communities and individuals alike. The rising occurrence highlights the urgency for enhanced methods and robust regulations on digital platforms to protect humans from such prejudicial and damaging conduct. Hate speech typically appears as a deliberate hostile action aimed at a particular group, often with the intent to demean or isolate them based on various facets of their identity. Research on hate speech predominantly targets resource-aware languages like English, German, and Chinese. Conversely, resource-limited languages, including European languages such as Italian, Spanish, and Portuguese, alongside Asian languages like Roman Urdu, Korean, and Indonesian, present obstacles. These challenges arise from a lack of linguistic resources, making the extraction of information a more strenuous task. This study is focused on the detection and improvement of multilingual hate speech detection across 13 different languages. To conduct a thorough analysis, we carried out a series of experiments that ranged from classical machine learning techniques and mainstream deep learning approaches to recent transformer-based methods. Through hyperparameter tuning, optimization techniques, and generative configurations, we achieved robust and generalized performance capable of effectively identifying hate speech across various dialects. Specifically, we achieved a notable enhancement in detection performance, with precision and recall metrics exceeding baseline models by up to 10% across several lesser-studied languages. Additionally, our work extends the capabilities of explainable AI within this context, offering deeper insights into model decisions, which is crucial for regulatory and ethical considerations in AI deployment. Our study presents substantial performance improvements across various datasets and languages through meticulous comparisons. For example, our model significantly outperformed existing benchmarks: it achieved F1-scores of 0.90 in German (GermEval-2018), up from the baseline score of 0.72, and 0.93 in German (GermEval-2021), a substantial increase from 0.58. Additionally, it scored 0.95 in Roman Urdu HS, surpassing the previous peak of 0.91. Furthermore, for mixed-language datasets such as Italian and English (AMI 2018), our accuracy rose dramatically from 0.59 to 0.96. These outcomes emphasize the robustness and versatility of our model, establishing a new standard for hate speech detection systems across diverse linguistic settings.
引用
收藏
页码:121507 / 121537
页数:31
相关论文
共 50 条
  • [31] VOT production in multilingual learners of French as a foreign language: cross-linguistic influence from the heritage languages Russian and Turkish
    Gabriel, Christoph
    Krause, Marion
    Dittmers, Tetyana
    REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2018, 23 (01): : 59 - 72
  • [32] Language dominance does not always predict cross-linguistic interactions in bilingual speech production
    Amengual, Mark
    Simonet, Miquel
    LINGUISTIC APPROACHES TO BILINGUALISM, 2020, 10 (06) : 847 - 872
  • [33] EXPLORATION OF LANGUAGE-SPECIFIC SELF-ATTENTION PARAMETERS FOR MULTILINGUAL END-TO-END SPEECH RECOGNITION
    Houston, Brady
    Kirchhoff, Katrin
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 755 - 762
  • [34] The Hebrew Communicative Development Inventory: language specific properties and cross-linguistic generalizations
    Maital, SL
    Dromi, E
    Sagi, A
    Bornstein, MH
    JOURNAL OF CHILD LANGUAGE, 2000, 27 (01) : 43 - 67
  • [35] Cross-Linguistic Interactions in Third Language Acquisition: Evidence from Multi-Feature Analysis of Speech Perception
    Wrembel, Magdalena
    Gut, Ulrike
    Kopeckova, Romana
    Balas, Anna
    LANGUAGES, 2020, 5 (04) : 1 - 21
  • [36] Emotion regulation elicits cross-linguistically shared and language-specific forms of linguistic distancing
    Holmes, Kevin J.
    Kassin, Lena
    Buchillon-Almeida, Daniela
    Canseco-Gonzalez, Enriqueta
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [37] What is in a name: Taxonomy of speech sound disorders from a cross-linguistic perspective
    Petinou-Loizou, Kakia
    Ttofari, Kerry
    Filippou, Elma
    INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS, 2024, 59 (06) : 2123 - 2130
  • [38] Spatial Language, Polysemy, and Cross-Linguistic Semantic Mismatches: Cognitive Linguistics Insights into Challenges for Second Language Learners
    Tyler, Andrea
    SPATIAL COGNITION AND COMPUTATION, 2012, 12 (04) : 305 - 335
  • [39] Assessing vocabulary size in Malaysian preschoolers: insights from the Malaysian English cross-linguistic lexical task and parents of multilingual children questionnaire
    Lew, Joe W.
    Luniewska, Magdalena
    Lee, Soon Tat
    Yap, Ngee Thai
    INTERNATIONAL JOURNAL OF BILINGUAL EDUCATION AND BILINGUALISM, 2025,
  • [40] Insights on NIRS sensitivity from a cross-linguistic study on the emergence of phonological grammar
    Minagawa-Kawai, Yasuyo
    Cristia, Alejandrina
    Long, Bria
    Vendelin, Inga
    Hakuno, Yoko
    Dutat, Michel
    Filippin, Luca
    Cabrol, Dominique
    Dupoux, Emmanuel
    FRONTIERS IN PSYCHOLOGY, 2013, 4