Improving hate speech detection using Cross-Lingual Learning

被引:8
|
作者
Firmino, Anderson Almeida [1 ]
Baptista, Claudio de Souza [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil
[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil
关键词
Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;
D O I
10.1016/j.eswa.2023.121115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Cross-Lingual Learning Strategies for Improving Product Matching Quality
    Firmino Alves, Andre Luiz
    Baptista, Claudio de Souza
    Barbosa, Luciano
    Medeiros Araujo, Clecio Bruno
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 313 - 320
  • [22] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 174 - 182
  • [23] Speech Recognition for Turkic Languages Using Cross-Lingual Transfer Learning from Kazakh
    Orel, Daniil
    Yeshpanov, Rustem
    Varol, Huseyin Atakan
    Proceedings - 2023 IEEE International Conference on Big Data and Smart Computing, BigComp 2023, 2023, : 174 - 182
  • [24] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    INTERSPEECH 2022, 2022, : 2178 - 2182
  • [25] Cross-lingual Emotion Detection
    Hassan, Sabit
    Shaar, Shaden
    Darwish, Kareem
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 6948 - 6958
  • [26] Cross-lingual Continual Learning
    M'hamdi, Meryem
    Ren, Xiang
    May, Jonathan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 3908 - 3943
  • [27] Cross-lingual Emotion Detection
    Hassan, Sabit
    Shaar, Shaden
    Darwish, Kareem
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6948 - 6958
  • [28] Cross-lingual Dialog Model for Speech to Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1173 - 1176
  • [29] Cross-Lingual Automatic Speech Recognition Using Tandem Features
    Lal, Partha
    King, Simon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2506 - 2515
  • [30] Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
    Wester, Mirjam
    Liang, Hui
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2492 - 2495