HATE SPEECH DETECTION IN LOW-RESOURCE BODO AND ASSAMESE TEXTS WITH ML-DL AND BERT MODELS

被引：6

作者：

Ghosh, Koyel ^{[1
]}

Senapati, Apurbalal ^{[1
]}

Narzary, Mwnthai ^{[1
]}

Brahma, Maharaj ^{[2
]}

机构：

[1] Cent Inst Technol, Dept Comp Sci & Engn, Kokrajhar, Assam, India

[2] IIT Hyderabad, Dept Comp Sci & Engn, Hyderabad, India

来源：

SCALABLE COMPUTING-PRACTICE AND EXPERIENCE | 2023年 / 24卷 / 04期

关键词：

Hate Speech Detection; Assamese; Bodo; Natural Language Processing; NLP; Machine Learning; Deep Learning; Word2Vec; NB; SVM; LSTM; BiLSTM; CNN; BERT;

D O I：

10.12694/scpe.v24i4.2469

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.

引用

页码：941 / 955

页数：15

共 29 条

[21] End-to-End Speech Recognition with Deep Fusion: Leveraging External Language Models for Low-Resource Scenarios
Zhang, Lusheng
Wu, Shie
Wang, Zhongxun
ELECTRONICS, 2025, 14 (04):
[22] Meta-Adaptable-Adapter: Efficient adaptation of self-supervised models for low-resource speech recognition
Chen, Yaqi
Zhang, Hao
Yang, Xukui
Zhang, Wenlin
Qu, Dan
NEUROCOMPUTING, 2024, 609
[23] Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sami
Getman, Yaroslav
Grosz, Tamas
Hiovain-Asikainen, Katri
Kurimo, Mikko
INTERSPEECH 2024, 2024, : 2539 - 2543
[24] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
Zhang, Xu
Zhang, Xiangcheng
Chen, Weisi
Li, Chenlong
Yu, Chengyuan
SCIENTIFIC REPORTS, 2024, 14 (01):
[25] Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
Koltcov, Sergei
Surkov, Anton
Koltsova, Olessia
Ignatenko, Vera
PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 19
[26] SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
Kim, Jiyeon
Kumar, Mehul
Gowda, Dhananjaya
Garg, Abhinav
Kim, Chanwoo
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 984 - 988
[27] Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model
Kumar, Sanjay
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
[28] NLP_Team1@SSN at SemEval-2024 Task 1: Impact of language models in Sentence-BERT for Semantic Textual Relatedness in Low-resource Languages
Kumar, Senthil B.
Chandrabose, Aravindan
Gokulakrishnan, B.
Karthikraja, T. P.
PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1854 - 1859
[29] Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments
Ghosh, Koyel
Senapati, Apurbalal
NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 393 - 414

← 1 2 3 →