Context-aware and expert data resources for Brazilian Portuguese hate speech detection

被引:0
|
作者
Vargas, Francielle [1 ,2 ]
Carvalho, Isabelle [1 ]
Pardo, Thiago A. S. [1 ]
Benevenuto, Fabricio [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, Brazil
[2] Univ Fed Minas Gerais, Comp Sci Dept, Belo Horizonte, Brazil
来源
NATURAL LANGUAGE PROCESSING | 2025年 / 31卷 / 02期
关键词
hate speech; Brazilian Portuguese; low-resource languages; RELIABILITY; PRAGMATICS;
D O I
10.1017/nlp.2024.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides data resources for low-resource hate speech detection. Specifically, we introduce two different data resources: (i) the HateBR 2.0 corpus, which is composed of 7,000 comments extracted from Brazilian politicians' accounts on Instagram and manually annotated a binary class (offensive versus non-offensive) and hate speech targets. It consists of an updated version of the HateBR corpus, in which highly similar and one-word comments were replaced; and (ii) the multilingual offensive lexicon (MOL), which consists of 1,000 explicit and implicit terms and expressions annotated with context information. The lexicon also comprises native-speaker translations and its cultural adaptations in English, Spanish, French, German, and Turkish. Both corpus and lexicon were annotated by three different experts and achieved high inter-annotator agreement. Lastly, we implemented baseline experiments on the proposed data resources. Results demonstrate the reliability of data outperforming baseline dataset results in Portuguese, besides presenting promising results for hate speech detection in different languages.
引用
收藏
页码:435 / 456
页数:22
相关论文
共 50 条
  • [21] CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION
    Ramet, Gaetan
    Garner, Philip N.
    Baeriswyl, Michael
    Lazaridis, Alexandros
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 126 - 131
  • [22] MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
    Zhang, Ya-Jie
    Song, Wei
    Yue, Yanghao
    Zhang, Zhengchen
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2023, 2023, : 4803 - 4807
  • [23] Speech Dereverberation With Context-Aware Recurrent Neural Networks
    Santos, Joao Felipe
    Falk, Tiago H.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (07) : 1232 - 1242
  • [24] Context-aware negotiation for reconfigurable resources with handheld devices
    O'Sullivan, T
    Studdert, R
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: OTM 2005 WORKSHOPS, PROCEEDINGS, 2005, 3762 : 186 - 195
  • [25] Context-aware RNNLM Rescoring for Conversational Speech Recognition
    Wei, Kun
    Guo, Pengcheng
    Lv, Hang
    Tu, Zhen
    Xie, Lei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [26] Enabling Intelligent Data Exchange in the Brazilian Energy Sector: A Context-Aware Ontological Approach
    Jenevain, Matheus B.
    Pinto, Milena F.
    Dantas, Mario A. R.
    Villela, Regina M. M. B.
    David, Jose M. N.
    Menezes, Victor S. A.
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, AINA 2024, 2024, 199 : 372 - 381
  • [27] Context Data Preprocessing for Context-Aware Smartphone Authentication
    Nam, Sangjin
    Kim, Suntae
    Shin, Jung-Hoon
    Kim, Jeong Ah
    Park, Sooyong
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2019, PT V: 19TH INTERNATIONAL CONFERENCE, SAINT PETERSBURG, RUSSIA, JULY 14, 2019, PROCEEDINGS, PART V, 2019, 11623 : 190 - 202
  • [28] Context Schema Evolution in Context-Aware Data Management
    Quintarelli, Elisa
    Rabosio, Emanuele
    Tanca, Letizia
    CONCEPTUAL MODELING - ER 2011, 2011, 6998 : 290 - 303
  • [29] Diversity in Data for Speech Processing in Brazilian Portuguese
    Craveiro, Giovana Meloni
    Galdino, Julio Cesar
    INTELLIGENT SYSTEMS, BRACIS 2024, PT IV, 2025, 15415 : 122 - 136
  • [30] Context-aware data quality assessment for big data
    Ardagna, Danilo
    Cappiello, Cinzia
    Sama, Walter
    Vitali, Monica
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 89 : 548 - 562