COVID-HateBERT: a Pre-trained Language Model for COVID-19 related Hate Speech Detection

被引:9
|
作者
Li, Mingqi [1 ]
Liao, Song [1 ]
Okpala, Ebuka [1 ]
Tong, Max [1 ,4 ]
Costello, Matthew [2 ]
Cheng, Long [1 ]
Hu, Hongxin [3 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29631 USA
[2] Clemson Univ, Dept Sociol, Clemson, SC 29631 USA
[3] Univ Buffalo, Dept Comp Sci & Engn, Buffalo, NY USA
[4] Christ Church Episcopal Sch, Greenville, SC USA
关键词
hate speech detection; language model; COVID-19; BERT;
D O I
10.1109/ICMLA52953.2021.00043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the dramatic growth of hate speech on social media during the COVID-19 pandemic, there is an urgent need to detect various hate speech effectively. Existing methods only achieve high performance when the training and testing data come from the same data distribution. The models trained on the traditional hateful dataset cannot fit well on COVID-19 related dataset. Meanwhile, manually annotating the hate speech dataset for supervised learning is time-consuming. Here, we propose COVID-HateBERT, a pre-trained language model to detect hate speech on English Tweets to address this problem. We collect 200M English tweets based on COVID-19 related hateful keywords and hashtags. Then, we use a classifier to extract the 1.27M potential hateful tweets to re-train BERT-base. We evaluate our COVID-HateBERT on four benchmark datasets. The COVID-HateBERT achieves a 14.8%-23.8% higher macro average F1 score on traditional hate speech detection comparing to baseline methods and a 2.6%-6.73% higher macro average F1 score on COVID-19 related hate speech detection comparing to classifiers using BERT and BERTweet, which shows that COIVD-HateBERT can generalize well on different datasets.
引用
收藏
页码:233 / 238
页数:6
相关论文
共 50 条
  • [1] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
    Daouadi, Kheir Eddine
    Boualleg, Yaakoub
    Guehairia, Oussama
    COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693
  • [2] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [3] Asian hate speech detection on Twitter during COVID-19
    Toliyat, Amir
    Levitan, Sarah Ita
    Peng, Zheng
    Etemadpour, Ronak
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [4] Blockchain-Based Trusted Federated Learning with Pre-Trained Models for COVID-19 Detection
    Bian, Genqing
    Qu, Wenjing
    Shao, Bilin
    ELECTRONICS, 2023, 12 (09)
  • [5] Predicting the Hate: A GSTM Model based on COVID-19 Hate Speech Datasets
    Wu, Xiao-Kun
    Zhao, Tian-Fang
    Lu, Lu
    Chen, Wei-Neng
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
  • [6] COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets
    Malla, SreeJagadeesh
    Alphonse, P. J. A.
    APPLIED SOFT COMPUTING, 2021, 107
  • [7] COVID-19 outbreak: An ensemble pre-trained deep learning model for detecting informative tweets
    Malla, SreeJagadeesh
    P.J.A., Alphonse
    Malla, SreeJagadeesh (malla.sree@gmail.com), 1600, Elsevier Ltd (107):
  • [8] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
    Shon, Suwon
    Brusco, Pablo
    Pan, Jing
    Han, Kyu J.
    Watanabe, Shinji
    INTERSPEECH 2021, 2021, : 3420 - 3424
  • [9] SPEECHCLIP: INTEGRATING SPEECH WITH PRE-TRAINED VISION AND LANGUAGE MODEL
    Shih, Yi-Jen
    Wang, Hsuan-Fu
    Chang, Heng-Jui
    Berry, Layne
    Lee, Hung-yi
    Harwath, David
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 715 - 722
  • [10] COVID-19 Detection on X-Ray Images using a Combining Mechanism of Pre-trained CNNs
    El Gannour, Oussama
    Hamida, Soufiane
    Saleh, Shawki
    Lamalem, Yasser
    Cherradi, Bouchaib
    Raihani, Abdelhadi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 564 - 570