Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages

被引:7
|
作者
Roy, Pradeep Kumar [1 ]
机构
[1] Indian Inst Informat Technol, Dept Comp Sci & Engn, Surat 394190, Gujarat, India
关键词
Sentiment analysis; code-mixed; transformer; BERT; Kannada; Malayalam; ensemble learning; deep learning; machine learning;
D O I
10.1145/3600229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis (SA) is the systematic identification, extraction, quantification, and study of affective states and subjective information using natural language processing. It is widely used for analyzing users' feedback, such as reviews or social posts. Recently, SA has been one of the favorite research domains in NLP due to their wide range of applications, including E-commerce, healthcare, hotel business, and others. Many machine learning and deep learning-based models exist to predict the sentiment of the user's post. However, the sentiment analysis in low-resource languages such as Kannada, Malayalam, Telugu, and Tamil received less attention due to language complexity and the low availability of required resources. This research fills the gap by proposing an ensemble model for predicting the sentiment of code-mixed Kannada and Malayalam languages. The ensemble of transformer-based models achieved a promising weighted F-1-score of 0.66 for Kannada code-mixed language. In contrast, the ensemble model of the deep learning framework performed best by achieving a weighted F-1-score of 0.72 for the Malayalam dataset, outperforming existing research.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] A Deep Learning model for Question Analysis in Low-resource Languages: A Dataset and Case Study for Persian
    Khaksefidi, Fatemeh Ebrahimi
    Fatemi, Afsaneh
    Nematbakhsh, Mohammad Ali
    Kia, Mahsa Abazari
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [42] BOOSTING PERFORMANCE ON LOW-RESOURCE LANGUAGES BY STANDARD CORPORA: AN ANALYSIS
    Grezl, Frantisek
    Karafiat, Martin
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 629 - 636
  • [43] A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
    Vania, Clara
    Kementchedjhieva, Yova
    Sogaard, Anders
    Lopez, Adam
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1105 - 1116
  • [44] Preservation of sentiment in machine translation of low-resource languages: a case study on Slovak movie subtitles
    Reichel, Jaroslav
    Benko, Lubomir
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [45] CROSS-LINGUAL DEEP NEURAL NETWORK BASED SUBMODULAR UNBIASED DATA SELECTION FOR LOW-RESOURCE KEYWORD SEARCH
    Ni, Chongjia
    Leung, Cheung-Chi
    Wang, Lei
    Liu, Haibo
    Rao, Feng
    Lu, Li
    Chen, Nancy F.
    Ma, Bin
    Li, Haizhou
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6015 - 6019
  • [46] The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing
    Ghafoor, Abdul
    Imran, Ali Shariq
    Daudpota, Sher Muhammad
    Kastrati, Zenun
    Abdullah
    Batra, Rakhi
    Wani, Mudasir Ahmad
    IEEE ACCESS, 2021, 9 : 124478 - 124490
  • [47] Trinity at SemEval-2023 Task 12: Sentiment Analysis for Low-resource African Languages using Twitter Dataset
    Rathi, Shashank
    Pande, Siddhesh
    Atkare, Harshwardhan
    Tangsali, Rahul
    Vyawahare, Aditya
    Kadam, Dipali
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1161 - 1165
  • [48] A Comparison of Lexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages
    Tho, Cuk
    Heryadi, Yaya
    Kartowisastro, Iman Herwidiana
    Budiharto, Widodo
    Proceedings of 2021 1st International Conference on Computer Science and Artificial Intelligence, ICCSAI 2021, 2021, : 81 - 85
  • [49] Continual Attention Modeling for Successive Sentiment Analysis in Low-resource Scenarios
    Zhang, Han
    Wang, Jing-Jing
    Luo, Jia-Min
    Zhou, Guo-Dong
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (12): : 5470 - 5486
  • [50] HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis
    Mamta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Saha, Tista
    Kumar, Alka
    Srivastava, Shikha
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7061 - 7070