Token replacement-based data augmentation methods for hate speech detection

被引:0
|
作者
Kosisochukwu Judith Madukwe
Xiaoying Gao
Bing Xue
机构
[1] Victoria University of Wellington,School of Engineering and Computer Science
来源
World Wide Web | 2022年 / 25卷
关键词
Hate speech data; Data augmentation; Token substitution; Word replacement; Data generation; Text data;
D O I
暂无
中图分类号
学科分类号
摘要
Hate speech detection mostly involves the use of text data. This data, usually sourced from various social media platforms, have been known to be plagued with numerous issues that result in a reduction of its quality and hence, the quality of the trained models. Some of these issues are the lack of diversity and the diminutive class of interest in the dataset which results in overfitted models that do not generalize well on other or newly collected data. The different ways of handling these issues include augmenting the data with diverse samples, engineering non-redundant features or designing robust classification models. In this study, the focus is on the data augmentation aspect. Data augmentation is a popular method for improving the quality of existing datasets by generating synthetic samples that mimic the distribution of the original samples. There is a lack of extensive studies on how hate speech texts respond to varying textual data augmentation techniques and methods. Specifically, we provide further insight into the token replacement method of textual data augmentation by performing empirical studies that investigate which embedding method(s) is a robust source of synonym for replacement process, what effective method(s) can be used to select words to be replaced, and how to confirm if the label within each class is preserved. Our proposed methods, validated on two commonly used hate speech datasets affected by a known lack of diversity and diminutive class of interest issues, significantly improve classification performance and provides insights into token replacement methods.
引用
收藏
页码:1129 / 1150
页数:21
相关论文
共 50 条
  • [31] Effective Data Augmentation Methods for Neural Text-to-Speech Systems
    Oh, Suhyeon
    Kwon, Ohsung
    Hwang, Min-Jae
    Kim, Jae-Min
    Song, Eunwoo
    2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,
  • [32] A COMPARISON OF STREAMING MODELS AND DATA AUGMENTATION METHODS FOR ROBUST SPEECH RECOGNITION
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 989 - 995
  • [33] FRAUG: A FRAME RATE BASED DATA AUGMENTATION METHOD FOR DEPRESSION DETECTION FROM SPEECH SIGNALS
    Ravi, Vijay
    Wang, Jinhan
    Flint, Jonathan
    Alwan, Abeer
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6267 - 6271
  • [34] TABHATE: A Target-based hate speech detection dataset in Hindi
    Sharma, Deepawali
    Singh, Vivek Kumar
    Gupta, Vedika
    SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [35] A comparison of data augmentation methods in voice pathology detection
    Javanmardi, Farhad
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    COMPUTER SPEECH AND LANGUAGE, 2023, 83
  • [36] The Effect of Data Augmentation Methods on Pedestrian Object Detection
    Liu, Bokun
    Su, Shaojing
    Wei, Junyu
    ELECTRONICS, 2022, 11 (19)
  • [37] Exploring Alternative Data Augmentation Methods in Dysarthric Automatic Speech Recognition
    Gracelli, Ricardo
    Almeida, Jurandy
    2024 IEEE 37TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS 2024, 2024, : 243 - 248
  • [38] A New Hate Speech Detection System based on Textual and Psychological Features
    Alkomah, Fatimah
    Salati, Sanaz
    Ma, Xiaogang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 860 - 869
  • [39] Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities
    Mansur, Zainab
    Omar, Nazlia
    Tiun, Sabrina
    IEEE ACCESS, 2023, 11 : 16226 - 16249
  • [40] Hate Speech Detection in Social Networks using Machine Learning and Deep Learning Methods
    Toktarova, Aigerim
    Syrlybay, Dariga
    Myrzakhmetova, Bayan
    Anuarbekova, Gulzat
    Rakhimbayeva, Gulbarshin
    Zhylanbaeva, Balkiya
    Suieuova, Nabat
    Kerimbekov, Mukhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) : 396 - 406