COVID-HateBERT: a Pre-trained Language Model for COVID-19 related Hate Speech Detection

被引:9
|
作者
Li, Mingqi [1 ]
Liao, Song [1 ]
Okpala, Ebuka [1 ]
Tong, Max [1 ,4 ]
Costello, Matthew [2 ]
Cheng, Long [1 ]
Hu, Hongxin [3 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29631 USA
[2] Clemson Univ, Dept Sociol, Clemson, SC 29631 USA
[3] Univ Buffalo, Dept Comp Sci & Engn, Buffalo, NY USA
[4] Christ Church Episcopal Sch, Greenville, SC USA
关键词
hate speech detection; language model; COVID-19; BERT;
D O I
10.1109/ICMLA52953.2021.00043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the dramatic growth of hate speech on social media during the COVID-19 pandemic, there is an urgent need to detect various hate speech effectively. Existing methods only achieve high performance when the training and testing data come from the same data distribution. The models trained on the traditional hateful dataset cannot fit well on COVID-19 related dataset. Meanwhile, manually annotating the hate speech dataset for supervised learning is time-consuming. Here, we propose COVID-HateBERT, a pre-trained language model to detect hate speech on English Tweets to address this problem. We collect 200M English tweets based on COVID-19 related hateful keywords and hashtags. Then, we use a classifier to extract the 1.27M potential hateful tweets to re-train BERT-base. We evaluate our COVID-HateBERT on four benchmark datasets. The COVID-HateBERT achieves a 14.8%-23.8% higher macro average F1 score on traditional hate speech detection comparing to baseline methods and a 2.6%-6.73% higher macro average F1 score on COVID-19 related hate speech detection comparing to classifiers using BERT and BERTweet, which shows that COIVD-HateBERT can generalize well on different datasets.
引用
收藏
页码:233 / 238
页数:6
相关论文
共 50 条
  • [31] The Impact of COVID-19 on Voice, Speech, and Language: An Interdisciplinary Study of COVID-19 Survivors
    Kuc, Joanna
    Michta, Tomasz
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2023, 23 (03): : 42 - 57
  • [32] Adapter Learning from Pre-trained Model for Robust Spoof Speech Detection
    Wu, Haochen
    Guo, Wu
    Peng, Shengyu
    Li, Zhuhai
    Zhang, Jie
    INTERSPEECH 2024, 2024, : 2095 - 2099
  • [33] GENERATING HUMAN READABLE TRANSCRIPT FOR AUTOMATIC SPEECH RECOGNITION WITH PRE-TRAINED LANGUAGE MODEL
    Liao, Junwei
    Shi, Yu
    Gong, Ming
    Shou, Linjun
    Eskimez, Sefik
    Lu, Liyang
    Qu, Hong
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7578 - 7582
  • [34] From hate to harmony: Leveraging large language models for safer speech in times of COVID-19 crisis
    Chao, August F. Y.
    Wang, Chen-Shu
    Li, Bo-Yi
    Chen, Hong-Yan
    HELIYON, 2024, 10 (16)
  • [35] Pre-trained quantum convolutional neural network for COVID-19 disease classification using computed tomography images
    Asadoorian, Nazeh
    Yaraghi, Shokufeh
    Tahmasian, Araeek
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [36] COVID-19 diagnosis: A comprehensive review of pre-trained deep learning models based on feature extraction algorithm
    Poola, Rahul Gowtham
    Pl, Lahari
    Sankar, Y. Siva
    RESULTS IN ENGINEERING, 2023, 18
  • [37] Concatenation of Pre-Trained Convolutional Neural Networks for Enhanced COVID-19 Screening Using Transfer Learning Technique
    El Gannour, Oussama
    Hamida, Soufiane
    Cherradi, Bouchaib
    Al-Sarem, Mohammed
    Raihani, Abdelhadi
    Saeed, Faisal
    Hadwan, Mohammed
    ELECTRONICS, 2022, 11 (01)
  • [38] Combating hate speech using an adaptive ensemble learning model with a case study on COVID-19
    Agarwal, Shivang
    Chowdary, C. Ravindranath
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
  • [39] Introduction: Language and Communication Related to COVID-19
    Jucks, Regina
    Hendriks, Friederike
    JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY, 2021, 40 (5-6) : 540 - 545
  • [40] Chinese cyber-violent Speech Detection and Analysis Based on Pre-trained Model
    Zhou, Sunrui
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 443 - 447