HATE SPEECH DETECTION IN LOW-RESOURCE BODO AND ASSAMESE TEXTS WITH ML-DL AND BERT MODELS

被引:6
|
作者
Ghosh, Koyel [1 ]
Senapati, Apurbalal [1 ]
Narzary, Mwnthai [1 ]
Brahma, Maharaj [2 ]
机构
[1] Cent Inst Technol, Dept Comp Sci & Engn, Kokrajhar, Assam, India
[2] IIT Hyderabad, Dept Comp Sci & Engn, Hyderabad, India
来源
关键词
Hate Speech Detection; Assamese; Bodo; Natural Language Processing; NLP; Machine Learning; Deep Learning; Word2Vec; NB; SVM; LSTM; BiLSTM; CNN; BERT;
D O I
10.12694/scpe.v24i4.2469
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.
引用
收藏
页码:941 / 955
页数:15
相关论文
共 29 条
  • [1] Tackling Hate Speech in Low-resource Languages with Context Experts
    Nkemelu, Daniel
    Shah, Harshil
    Essa, Irfan
    Best, Michael L.
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES AND DEVELOPMENT, ICTD 2022, 2022,
  • [2] Investigating the Predominance of Large Language Models in Low-Resource Bangla Language over Transformer Models for Hate Speech Detection: A Comparative Analysis
    Faria, Fatema Tuj Johora
    Baniata, Laith H.
    Kang, Sangwoo
    MATHEMATICS, 2024, 12 (23)
  • [3] Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
    Jacobs, Christiaan
    Carraz Rakotonirina, Nathanael
    Chimoto, Everlyn Asiko
    Bassett, Bruce A.
    Kamper, Herman
    INTERSPEECH 2023, 2023, : 436 - 440
  • [4] Addressing Challenges in Hate Speech Detection Using BERT-Based Models: A Review
    Aljawazeri J.A.
    Jasim M.N.
    Iraqi Journal for Computer Science and Mathematics, 2024, 5 (02): : 1 - 20
  • [5] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [6] Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
    Zheng, Guolin
    Xiao, Yubei
    Gong, Ke
    Zhou, Pan
    Liang, Xiaodan
    Lin, Liang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2765 - 2777
  • [7] AAEBERT: Debiasing BERT-based Hate Speech Detection Models via Adversarial Learning
    Okpala, Ebuka
    Cheng, Long
    Mbwambo, Nicodemus
    Luo, Feng
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1606 - 1612
  • [8] Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages
    Sawant, Madhuri
    Qureshi, M. Atif
    Younus, Arjumand
    Caton, Simon
    PROCEEDINGS OF THE 2024 WORKSHOP ON OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS, OASIS 2024, 2024, : 10 - 17
  • [9] A General Procedure for Improving Language Models in Low-Resource Speech Recognition
    Liu, Qian
    Zhang, Wei-Qiang
    Liu, Jia
    Liu, Yao
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 428 - 433
  • [10] Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech
    Yu, Chongchong
    Yu, Jiaqi
    Qian, Zhaopeng
    Tan, Yuchen
    SENSORS, 2023, 23 (04)