Context-aware and expert data resources for Brazilian Portuguese hate speech detection

被引:0
|
作者
Vargas, Francielle [1 ,2 ]
Carvalho, Isabelle [1 ]
Pardo, Thiago A. S. [1 ]
Benevenuto, Fabricio [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, Brazil
[2] Univ Fed Minas Gerais, Comp Sci Dept, Belo Horizonte, Brazil
来源
NATURAL LANGUAGE PROCESSING | 2025年 / 31卷 / 02期
关键词
hate speech; Brazilian Portuguese; low-resource languages; RELIABILITY; PRAGMATICS;
D O I
10.1017/nlp.2024.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides data resources for low-resource hate speech detection. Specifically, we introduce two different data resources: (i) the HateBR 2.0 corpus, which is composed of 7,000 comments extracted from Brazilian politicians' accounts on Instagram and manually annotated a binary class (offensive versus non-offensive) and hate speech targets. It consists of an updated version of the HateBR corpus, in which highly similar and one-word comments were replaced; and (ii) the multilingual offensive lexicon (MOL), which consists of 1,000 explicit and implicit terms and expressions annotated with context information. The lexicon also comprises native-speaker translations and its cultural adaptations in English, Spanish, French, German, and Turkish. Both corpus and lexicon were annotated by three different experts and achieved high inter-annotator agreement. Lastly, we implemented baseline experiments on the proposed data resources. Results demonstrate the reliability of data outperforming baseline dataset results in Portuguese, besides presenting promising results for hate speech detection in different languages.
引用
收藏
页码:435 / 456
页数:22
相关论文
共 50 条
  • [31] Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources
    Kovács G.
    Alonso P.
    Saini R.
    SN Computer Science, 2021, 2 (2)
  • [32] On wandering detection methods in context-aware scenarios
    Batista, Edgar
    Casino, Fran
    Solanas, Agusti
    2016 7TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS & APPLICATIONS (IISA), 2016,
  • [33] Context-aware CNNs for person head detection
    Tuan-Hung Vu
    Osokin, Anton
    Laptev, Ivan
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2893 - 2901
  • [34] Scene Context-Aware Salient Object Detection
    Siris, Avishek
    Jiao, Jianbo
    Tam, Gary K. L.
    Xie, Xianghua
    Lau, Rynson W. H.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 4136 - 4146
  • [35] Context-Aware Sarcasm Detection Using BERT
    Baruah, Arup
    Das, Kaushik Amar
    Barbhuiya, Ferdous Ahmed
    Dey, Kuntal
    FIGURATIVE LANGUAGE PROCESSING, 2020, : 83 - 87
  • [36] Anomaly detection in Context-aware Feature Models
    Mauro, Jacopo
    PROCEEDINGS OF 15TH INTERNATIONAL WORKING CONFERENCE ON VARIABILITY MODELLING OF SOFTWARE-INTENSIVE SYSTEMS, VAMOS 2021, 2021,
  • [37] Context-Aware Anomaly Detection in Embedded Systems
    Ehsani-Besheli, Fatemeh
    Zarandi, Hamid R.
    ADVANCES IN DEPENDABILITY ENGINEERING OF COMPLEX SYSTEMS, 2018, 582 : 151 - 165
  • [38] Context-Aware Anomaly Detection in Attributed Networks
    Liu, Ming
    Liao, Jianxin
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 14 - 26
  • [39] On the Context-Aware Anomaly Detection in Vehicular Networks
    Aljaafari, Mohammed Abdullatif H.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (12) : 832 - 840
  • [40] Context-Aware Online Commercial Intention Detection
    Hu, Derek Hao
    Shen, Dou
    Sun, Jian-Tao
    Yang, Qiang
    Chen, Zheng
    ADVANCES IN MACHINE LEARNING, PROCEEDINGS, 2009, 5828 : 135 - +