Improving hate speech detection using Cross-Lingual Learning

被引:8
|
作者
Firmino, Anderson Almeida [1 ]
Baptista, Claudio de Souza [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil
[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil
关键词
Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;
D O I
10.1016/j.eswa.2023.121115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Cross-Lingual Features for Alzheimer's Dementia Detection from Speech
    Melistas, Thomas
    Kapelonis, Lefteris
    Antoniou, Nikos
    Mitseas, Petros
    Sgouropoulos, Dimitris
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Narayanan, Shrikanth
    INTERSPEECH 2023, 2023, : 3008 - 3012
  • [32] Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis
    Zhang, Weizhao
    Yang, Hongwu
    Bu, Xiaolong
    Wang, Lili
    IEEE ACCESS, 2019, 7 : 167884 - 167894
  • [33] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
    Khurana, Sameer
    Dawalatabad, Nauman
    Laurent, Antoine
    Vicente, Luis
    Gimeno, Pablo
    Mingote, Victoria
    Glass, James
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
  • [34] Cross-Lingual Adaptation Using Structural Correspondence Learning
    Prettenhofer, Peter
    Stein, Benno
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (01)
  • [35] Cross-lingual adaptation using structural correspondence learning
    Prettenhofer, Peter
    Stein, Benno
    ACM Transactions on Intelligent Systems and Technology, 2011, 3 (01)
  • [36] Hope Speech Detection for Dravidian Languages Using Cross-Lingual Embeddings with Stacked Encoder Architecture
    Arunima Sundar
    Akshay Ramakrishnan
    Avantika Balaji
    Thenmozhi Durairaj
    SN Computer Science, 2022, 3 (1)
  • [37] Cross-Lingual Speech-to-Text Summarization
    Pontes, Elvys Linhares
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Huet, Stephane
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, 2019, 833 : 385 - 395
  • [38] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [39] Improving Hate Speech Detection with Deep Learning Ensembles
    Zimmerman, Steven
    Fox, Chris
    Kruschwitz, Udo
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2546 - 2553
  • [40] SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
    Fatima, Mehwish
    Kolber, Tim
    Markert, Katja
    Strube, Michael
    NewSumm 2023 - Proceedings of the 4th New Frontiers in Summarization Workshop, Proceedings of EMNLP Workshop, 2023, : 24 - 40