Semi-meta-supervised hate speech detection

被引:3
|
作者
Putra, Cendra Devayana [1 ]
Wang, Hei-Chia [1 ,2 ]
机构
[1] Natl Cheng Kung Univ, Inst Informat Management, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Ctr Innovat FinTech Business Models, Tainan 701, Taiwan
关键词
Semisupervised learning; Single -task learning; Hate speech; Shared knowledge;
D O I
10.1016/j.knosys.2024.111386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
On social media, hate speech is a daily occurrence but has physical and psychological implications. Utilizing a deep learning strategy to combat hate speech is one method for preventing it. Deep learning techniques may require massive datasets to generate accurate models, but hate speech samples (such as misogyny and cyber samples) are frequently insufficient and diverse. We offer methods for leveraging these diverse datasets and enhancing deep learning models through knowledge sharing. We analyzed the existing Bidirectional Encoder Representations from Transformers (BERT) technique and built a BERT-3CNN method to generate a single -task classifier that optimally absorbs the target dataset's features. Second, we proposed a shared BERT layer to gain a general understanding of hate speech. Third, we proposed a method for adapting another dataset to the desired dataset. We conducted several quantitative experimental investigations on five datasets, including Hatebase, Supremacist, Cybertroll, TRAC, and TRAC 2020, and assessed the achieved performance using the accuracy and F1 metrics. The first experiment demonstrated that our BERT-3CNN model improved the average accuracy by 5% and the F1 score by 18%. The second experiment demonstrated that BERT -SP improved the average accuracy by 0.2% and the F1 score by 2%. TRAC, Supremacist, Hatebase, and Cybertroll all showed improvements in accuracy, with Semi BERT -SP enhancing accuracy by 6% and F1 score by 5%, while TRAC2020 showed 10% and 9% improvements.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Topic Oriented Hate Speech Detection
    Jamil, Raihan
    Khan, Mohammad Abdullah Al Nayeem
    Anwar, Md Musfique
    HYBRID INTELLIGENT SYSTEMS, HIS 2021, 2022, 420 : 365 - 375
  • [22] Constructing ensembles for hate speech detection
    Kucukkaya, Izzet Emre
    Toraman, Cagri
    NATURAL LANGUAGE PROCESSING, 2024,
  • [23] Hate Speech Detection with Comment Embeddings
    Djuric, Nemanja
    Zhou, Jing
    Morris, Robin
    Grbovic, Mihajlo
    Radosavljevic, Vladan
    Bhamidipati, Narayan
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 29 - 30
  • [24] Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model
    Saleh, Hind
    Alhothali, Areej
    Moria, Kawthar
    APPLIED ARTIFICIAL INTELLIGENCE, 2023, 37 (01)
  • [25] Supervised Classifiers to Identify Hate Speech on English and Spanish Tweets
    Almatarneh, Sattam
    Gamallo, Pablo
    Ribadas Pena, Francisco J.
    Alexeev, Alexey
    DIGITAL LIBRARIES AT THE CROSSROADS OF DIGITAL INFORMATION FOR THE FUTURE, ICADL 2019, 2019, 11853 : 23 - 30
  • [26] A Federated Approach for Hate Speech Detection
    Gala, Jay
    Gandhi, Deep
    Mehta, Jash
    Talat, Zeerak
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3248 - 3259
  • [27] Levantine hate speech detection in twitter
    AbdelHamid, Medyan
    Jafar, Assef
    Rahal, Yasser
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [28] Hate Speech Detection for the Power Domain
    Huang, Qingbao
    Deng, Zehua
    Chen, Shizhen
    Chen, Yifei
    Shuang, Feng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 333 - 345
  • [29] Hate Speech Detection in Roman Urdu
    Khan, Muhammad Moin
    Shahzad, Khurram
    Malik, Muhammad Kamran
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [30] Semi-supervised speech activity detection with an application to automatic speaker verification
    Sholokhov, Alexey
    Sahidullah, Md
    Kinnunen, Tomi
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 132 - 156