COVID-HateBERT: a Pre-trained Language Model for COVID-19 related Hate Speech Detection

被引:9
|
作者
Li, Mingqi [1 ]
Liao, Song [1 ]
Okpala, Ebuka [1 ]
Tong, Max [1 ,4 ]
Costello, Matthew [2 ]
Cheng, Long [1 ]
Hu, Hongxin [3 ]
Luo, Feng [1 ]
机构
[1] Clemson Univ, Sch Comp, Clemson, SC 29631 USA
[2] Clemson Univ, Dept Sociol, Clemson, SC 29631 USA
[3] Univ Buffalo, Dept Comp Sci & Engn, Buffalo, NY USA
[4] Christ Church Episcopal Sch, Greenville, SC USA
关键词
hate speech detection; language model; COVID-19; BERT;
D O I
10.1109/ICMLA52953.2021.00043
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the dramatic growth of hate speech on social media during the COVID-19 pandemic, there is an urgent need to detect various hate speech effectively. Existing methods only achieve high performance when the training and testing data come from the same data distribution. The models trained on the traditional hateful dataset cannot fit well on COVID-19 related dataset. Meanwhile, manually annotating the hate speech dataset for supervised learning is time-consuming. Here, we propose COVID-HateBERT, a pre-trained language model to detect hate speech on English Tweets to address this problem. We collect 200M English tweets based on COVID-19 related hateful keywords and hashtags. Then, we use a classifier to extract the 1.27M potential hateful tweets to re-train BERT-base. We evaluate our COVID-HateBERT on four benchmark datasets. The COVID-HateBERT achieves a 14.8%-23.8% higher macro average F1 score on traditional hate speech detection comparing to baseline methods and a 2.6%-6.73% higher macro average F1 score on COVID-19 related hate speech detection comparing to classifiers using BERT and BERTweet, which shows that COIVD-HateBERT can generalize well on different datasets.
引用
收藏
页码:233 / 238
页数:6
相关论文
共 50 条
  • [41] COVID-19 detection based on pre-trained deep networks and LSTM model using X-ray images enhanced contrast with artificial bee colony algorithm
    Er, Mehmet Bilal
    EXPERT SYSTEMS, 2023, 40 (03)
  • [42] OPT-CO: Optimizing pre-trained transformer models for efficient COVID-19 classification with stochastic configuration networks
    Zhu, Ziquan
    Liu, Lu
    Free, Robert C.
    Anjum, Ashiq
    Panneerselvam, John
    INFORMATION SCIENCES, 2024, 680
  • [43] Stigmatization in social media: Documenting and analyzing hate speech for COVID-19 on Twitter
    Fan L.
    Yu H.
    Yin Z.
    Proceedings of the Association for Information Science and Technology, 2020, 57 (01)
  • [44] Detecting When Pre-trained nnU-Net Models Fail Silently for Covid-19 Lung Lesion Segmentation
    Gonzalez, Camila
    Gotkowski, Karol
    Bucher, Andreas
    Fischbach, Ricarda
    Kaltenborn, Isabel
    Mukhopadhyay, Anirban
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VII, 2021, 12907 : 304 - 314
  • [45] A pre-trained convolutional neural network with optimized capsule networks for chest X-rays COVID-19 diagnosis
    AbouEl-Magd, Lobna M.
    Darwish, Ashraf
    Snasel, Vaclav
    Hassanien, Aboul Ella
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2023, 26 (02): : 1389 - 1403
  • [46] A cross-lingual transfer learning method for online COVID-19-related hate speech detection
    Liu, Lin
    Xu, Duo
    Zhao, Pengfei
    Zeng, Daniel Dajun
    Hu, Paul Jen-Hwa
    Zhang, Qingpeng
    Luo, Yin
    Cao, Zhidong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [47] A pre-trained convolutional neural network with optimized capsule networks for chest X-rays COVID-19 diagnosis
    Lobna M. AbouEl-Magd
    Ashraf Darwish
    Vaclav Snasel
    Aboul Ella Hassanien
    Cluster Computing, 2023, 26 : 1389 - 1403
  • [48] Semi-Supervised Machine Learning for Analyzing COVID-19 Related Twitter Data for Asian Hate Speech
    Richardson, Caitlin
    Shah, Sandeep
    Yuan, Xiaohong
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1643 - 1648
  • [49] Impact of COVID-19 on the Speech and Language Therapy Profession and Their Patients
    Chadd, Katie
    Moyse, Kathryn
    Enderby, Pam
    FRONTIERS IN NEUROLOGY, 2021, 12
  • [50] A Classification-Detection Approach of COVID-19 Based on Chest X-ray and CT by Using Keras Pre-Trained Deep Learning Models
    Deng, Xing
    Shao, Haijian
    Shi, Liang
    Wang, Xia
    Xie, Tongling
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2020, 125 (02): : 579 - 596