Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning

被引:2
|
作者
Ali, Aizaz [1 ]
Khan, Maqbool [1 ,2 ]
Khan, Khalil [3 ]
Khan, Rehan Ullah [4 ]
Aloraini, Abdulrahman [4 ]
机构
[1] Pak Austria Fachhochschule Inst Appl Sci & Technol, Dept IT & Comp Sci, Haripur 22620, Pakistan
[2] Software Competence Ctr Hagenberg, Softwarepark 32a, A-4232 Hagenberg, Austria
[3] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Comp Sci, Astana 010000, Kazakhstan
[4] Qassim Univ, Coll Comp, Dept Informat Technol, POB 1162, Buraydah, Saudi Arabia
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 01期
关键词
Urdu sentiment analysis; convolutional neural networks; recurrent neural network; deep learning; natural language processing; neural networks; ROMAN URDU; REVIEWS;
D O I
10.32604/cmc.2024.048712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understanding public opinion and user sentiment across diverse languages. While numerous scholars conduct sentiment analysis in widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grappling with resource -poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language, characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu, Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguistic features, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis a formidable undertaking. The limited availability of resources has fueled increased interest among researchers, prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu language sentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into five labels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments and emotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, the initial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such as newspapers, articles, and social media comments. Subsequent to this data collection, a thorough process of cleaning and preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deep learning models, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for both training and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning to optimize the models' efficacy. Evaluation metrics such as precision, recall, and the F1 -score are employed to assess the effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis, gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN, solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.
引用
收藏
页码:713 / 733
页数:21
相关论文
共 50 条
  • [21] Query Strategies, Assemble! Active Learning with Expert Advice for Low-resource Natural Language Processing
    Mendonca, Vania
    Sardinha, Alberto
    Coheur, Luisa
    Santos, Ana Lucia
    2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,
  • [22] Low resource language specific pre-processing and features for sentiment analysis task
    Loitongbam Sanayai Meetei
    Thoudam Doren Singh
    Samir Kumar Borgohain
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2021, 55 : 947 - 969
  • [23] Low resource language specific pre-processing and features for sentiment analysis task
    Meetei, Loitongbam Sanayai
    Singh, Thoudam Doren
    Borgohain, Samir Kumar
    Bandyopadhyay, Sivaji
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (04) : 947 - 969
  • [24] Meta Auxiliary Learning for Low-resource Spoken Language Understanding
    Gao, Yingying
    Feng, Junlan
    Deng, Chao
    Zhang, Shilei
    INTERSPEECH 2022, 2022, : 2703 - 2707
  • [25] Continual Attention Modeling for Successive Sentiment Analysis in Low-resource Scenarios
    Zhang, Han
    Wang, Jing-Jing
    Luo, Jia-Min
    Zhou, Guo-Dong
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (12): : 5470 - 5486
  • [26] HindiMD: A Multi-domain Corpora for Low-resource Sentiment Analysis
    Mamta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Saha, Tista
    Kumar, Alka
    Srivastava, Shikha
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7061 - 7070
  • [27] The Application of Natural Language Processing Technology Based on Deep Learning in Japanese Sentiment Analysis
    Zhang, Xuanxuan
    IEEE 1st International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics, AIKIIE 2023, 2023,
  • [28] Contrastive Learning for Morphological Disambiguation Using Large Language Models in Low-Resource Settings
    Tolegen, Gulmira
    Toleu, Alymzhan
    Mussabayev, Rustam
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [29] Data Selection using Spoken Language Identification for Low-Resource and Zero-Resource Speech Recognition
    Chen, Jianan
    Chu, Chenhui
    Li, Sheng
    Kawahara, Tatsuya
    APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024, 2024,
  • [30] A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
    Hedderich, Michael A.
    Lange, Lukas
    Adel, Heike
    Strotgen, Jannik
    Klakow, Dietrich
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2545 - 2568