HATE SPEECH DETECTION IN LOW-RESOURCE BODO AND ASSAMESE TEXTS WITH ML-DL AND BERT MODELS

被引:6
|
作者
Ghosh, Koyel [1 ]
Senapati, Apurbalal [1 ]
Narzary, Mwnthai [1 ]
Brahma, Maharaj [2 ]
机构
[1] Cent Inst Technol, Dept Comp Sci & Engn, Kokrajhar, Assam, India
[2] IIT Hyderabad, Dept Comp Sci & Engn, Hyderabad, India
来源
关键词
Hate Speech Detection; Assamese; Bodo; Natural Language Processing; NLP; Machine Learning; Deep Learning; Word2Vec; NB; SVM; LSTM; BiLSTM; CNN; BERT;
D O I
10.12694/scpe.v24i4.2469
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.
引用
收藏
页码:941 / 955
页数:15
相关论文
共 29 条
  • [21] End-to-End Speech Recognition with Deep Fusion: Leveraging External Language Models for Low-Resource Scenarios
    Zhang, Lusheng
    Wu, Shie
    Wang, Zhongxun
    ELECTRONICS, 2025, 14 (04):
  • [22] Meta-Adaptable-Adapter: Efficient adaptation of self-supervised models for low-resource speech recognition
    Chen, Yaqi
    Zhang, Hao
    Yang, Xukui
    Zhang, Wenlin
    Qu, Dan
    NEUROCOMPUTING, 2024, 609
  • [23] Exploring adaptation techniques of large speech foundation models for low-resource ASR: a case study on Northern Sami
    Getman, Yaroslav
    Grosz, Tamas
    Hiovain-Asikainen, Katri
    Kurimo, Mikko
    INTERSPEECH 2024, 2024, : 2539 - 2543
  • [24] Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments
    Zhang, Xu
    Zhang, Xiangcheng
    Chen, Weisi
    Li, Chenlong
    Yu, Chengyuan
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [25] Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language
    Koltcov, Sergei
    Surkov, Anton
    Koltsova, Olessia
    Ignatenko, Vera
    PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 19
  • [26] SEMI-SUPERVISED TRANSFER LEARNING FOR LANGUAGE EXPANSION OF END-TO-END SPEECH RECOGNITION MODELS TO LOW-RESOURCE LANGUAGES
    Kim, Jiyeon
    Kumar, Mehul
    Gowda, Dhananjaya
    Garg, Abhinav
    Kim, Chanwoo
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 984 - 988
  • [27] Negative Stances Detection from Multilingual Data Streams in Low-Resource Languages on Social Media Using BERT and CNN-Based Transfer Learning Model
    Kumar, Sanjay
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
  • [28] NLP_Team1@SSN at SemEval-2024 Task 1: Impact of language models in Sentence-BERT for Semantic Textual Relatedness in Low-resource Languages
    Kumar, Senthil B.
    Chandrabose, Aravindan
    Gokulakrishnan, B.
    Karthikraja, T. P.
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1854 - 1859
  • [29] Hate speech detection in low-resourced Indian languages: An analysis of transformer-based monolingual and multilingual models with cross-lingual experiments
    Ghosh, Koyel
    Senapati, Apurbalal
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 393 - 414