Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning

被引:2
|
作者
Ali, Aizaz [1 ]
Khan, Maqbool [1 ,2 ]
Khan, Khalil [3 ]
Khan, Rehan Ullah [4 ]
Aloraini, Abdulrahman [4 ]
机构
[1] Pak Austria Fachhochschule Inst Appl Sci & Technol, Dept IT & Comp Sci, Haripur 22620, Pakistan
[2] Software Competence Ctr Hagenberg, Softwarepark 32a, A-4232 Hagenberg, Austria
[3] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Comp Sci, Astana 010000, Kazakhstan
[4] Qassim Univ, Coll Comp, Dept Informat Technol, POB 1162, Buraydah, Saudi Arabia
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 01期
关键词
Urdu sentiment analysis; convolutional neural networks; recurrent neural network; deep learning; natural language processing; neural networks; ROMAN URDU; REVIEWS;
D O I
10.32604/cmc.2024.048712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understanding public opinion and user sentiment across diverse languages. While numerous scholars conduct sentiment analysis in widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grappling with resource -poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language, characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu, Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguistic features, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis a formidable undertaking. The limited availability of resources has fueled increased interest among researchers, prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu language sentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into five labels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments and emotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, the initial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such as newspapers, articles, and social media comments. Subsequent to this data collection, a thorough process of cleaning and preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deep learning models, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for both training and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning to optimize the models' efficacy. Evaluation metrics such as precision, recall, and the F1 -score are employed to assess the effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis, gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN, solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.
引用
收藏
页码:713 / 733
页数:21
相关论文
共 50 条
  • [31] Low-resource Deep Entity Resolution with Transfer and Active Learning
    Kasai, Jungo
    Qian, Kun
    Gurajada, Sairam
    Li, Yunyao
    Popa, Lucian
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5851 - 5861
  • [32] Exploring Multi-lingual, Multi-task, and Adversarial Learning for Low-resource Sentiment Analysis
    Mamta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [33] ACTIVE LEARNING FOR LOW-RESOURCE SPEECH RECOGNITION: IMPACT OF SELECTION SIZE AND LANGUAGE MODELING DATA
    Syed, Ali Raza
    Rosenberg, Andrew
    Mandel, Michael
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5315 - 5319
  • [34] Enhancing Sentiment Analysis in Amharic: Leveraging Transformer-Based Language Model for Low-Resource African Languages
    Raychawdhary, Nilanjana
    Das, Amit
    Bhattacharya, Sutanu
    Dozier, Gerry
    Seals, Cheryl D.
    SOUTHEASTCON 2024, 2024, : 50 - 55
  • [35] Autoregressive Feature Extraction with Topic Modeling for Aspect-based Sentiment Analysis of Arabic as a Low-resource Language
    Sweidan, Asmaa Hashem
    El-Bendary, Nashwa
    Elhariri, Esraa
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [36] Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
    Chowdhury, Koel Dutta
    Hasanuzzaman, Mohammed
    Liu, Qun
    DEEP LEARNING APPROACHES FOR LOW-RESOURCE NATURAL LANGUAGE PROCESSING (DEEPLO), 2018, : 33 - 42
  • [37] Improving Sentiment Classification in Low-Resource Bengali Language Utilizing Cross-Lingual Self-supervised Learning
    Sazzed, Salim
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 218 - 230
  • [38] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [39] Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages
    Nazir, Muhammad Kashif
    Faisal, Cm Nadeem
    Habib, Muhammad Asif
    Ahmad, Haseeb
    IEEE ACCESS, 2025, 13 : 7538 - 7554
  • [40] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)