HATE SPEECH DETECTION IN LOW-RESOURCE BODO AND ASSAMESE TEXTS WITH ML-DL AND BERT MODELS

被引：6

作者：

Ghosh, Koyel ^{[1
]}

Senapati, Apurbalal ^{[1
]}

Narzary, Mwnthai ^{[1
]}

Brahma, Maharaj ^{[2
]}

机构：

[1] Cent Inst Technol, Dept Comp Sci & Engn, Kokrajhar, Assam, India

[2] IIT Hyderabad, Dept Comp Sci & Engn, Hyderabad, India

来源：

SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2023年 / 24卷 / 04期

关键词：

Hate Speech Detection; Assamese; Bodo; Natural Language Processing; NLP; Machine Learning; Deep Learning; Word2Vec; NB; SVM; LSTM; BiLSTM; CNN; BERT;

D O I：

10.12694/scpe.v24i4.2469

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.

引用

页码：941 / 955

页数：15

共 29 条

[1] Tackling Hate Speech in Low-resource Languages with Context Experts
Nkemelu, Daniel
Shah, Harshil
Essa, Irfan
Best, Michael L.
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES AND DEVELOPMENT, ICTD 2022, 2022,
[2] Investigating the Predominance of Large Language Models in Low-Resource Bangla Language over Transformer Models for Hate Speech Detection: A Comparative Analysis
Faria, Fatema Tuj Johora
Baniata, Laith H.
Kang, Sangwoo
MATHEMATICS, 2024, 12 (23)
[3] Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Jacobs, Christiaan
Carraz Rakotonirina, Nathanael
Chimoto, Everlyn Asiko
Bassett, Bruce A.
Kamper, Herman
INTERSPEECH 2023, 2023, : 436 - 440
[4] Addressing Challenges in Hate Speech Detection Using BERT-Based Models: A Review
Aljawazeri J.A.
Jasim M.N.
Iraqi Journal for Computer Science and Mathematics, 2024, 5 (02): : 1 - 20
[5] Multilingual acoustic models for speech recognition in low-resource devices
Garcia, Enrique Gil
Mengusoglu, Erhan
Janke, Eric
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
[6] Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Zheng, Guolin
Xiao, Yubei
Gong, Ke
Zhou, Pan
Liang, Xiaodan
Lin, Liang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2765 - 2777
[7] AAEBERT: Debiasing BERT-based Hate Speech Detection Models via Adversarial Learning
Okpala, Ebuka
Cheng, Long
Mbwambo, Nicodemus
Luo, Feng
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1606 - 1612
[8] Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages
Sawant, Madhuri
Qureshi, M. Atif
Younus, Arjumand
Caton, Simon
PROCEEDINGS OF THE 2024 WORKSHOP ON OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS, OASIS 2024, 2024, : 10 - 17
[9] A General Procedure for Improving Language Models in Low-Resource Speech Recognition
Liu, Qian
Zhang, Wei-Qiang
Liu, Jia
Liu, Yao
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 428 - 433
[10] Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech
Yu, Chongchong
Yu, Jiaqi
Qian, Zhaopeng
Tan, Yuchen
SENSORS, 2023, 23 (04)

← 1 2 3 →