Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning

被引：2

作者：

Ali, Aizaz ^{[1
]}

Khan, Maqbool ^{[1
,2
]}

Khan, Khalil ^{[3
]}

Khan, Rehan Ullah ^{[4
]}

Aloraini, Abdulrahman ^{[4
]}

机构：

[1] Pak Austria Fachhochschule Inst Appl Sci & Technol, Dept IT & Comp Sci, Haripur 22620, Pakistan

[2] Software Competence Ctr Hagenberg, Softwarepark 32a, A-4232 Hagenberg, Austria

[3] Nazarbayev Univ, Sch Engn & Digital Sci, Dept Comp Sci, Astana 010000, Kazakhstan

[4] Qassim Univ, Coll Comp, Dept Informat Technol, POB 1162, Buraydah, Saudi Arabia

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2024年 / 79卷 / 01期

关键词：

Urdu sentiment analysis; convolutional neural networks; recurrent neural network; deep learning; natural language processing; neural networks; ROMAN URDU; REVIEWS;

D O I：

10.32604/cmc.2024.048712

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sentiment analysis, a crucial task in discerning emotional tones within the text, plays a pivotal role in understanding public opinion and user sentiment across diverse languages. While numerous scholars conduct sentiment analysis in widely spoken languages such as English, Chinese, Arabic, Roman Arabic, and more, we come to grappling with resource -poor languages like Urdu literature which becomes a challenge. Urdu is a uniquely crafted language, characterized by a script that amalgamates elements from diverse languages, including Arabic, Parsi, Pashtu, Turkish, Punjabi, Saraiki, and more. As Urdu literature, characterized by distinct character sets and linguistic features, presents an additional hurdle due to the lack of accessible datasets, rendering sentiment analysis a formidable undertaking. The limited availability of resources has fueled increased interest among researchers, prompting a deeper exploration into Urdu sentiment analysis. This research is dedicated to Urdu language sentiment analysis, employing sophisticated deep learning models on an extensive dataset categorized into five labels: Positive, Negative, Neutral, Mixed, and Ambiguous. The primary objective is to discern sentiments and emotions within the Urdu language, despite the absence of well-curated datasets. To tackle this challenge, the initial step involves the creation of a comprehensive Urdu dataset by aggregating data from various sources such as newspapers, articles, and social media comments. Subsequent to this data collection, a thorough process of cleaning and preprocessing is implemented to ensure the quality of the data. The study leverages two well-known deep learning models, namely Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), for both training and evaluating sentiment analysis performance. Additionally, the study explores hyperparameter tuning to optimize the models' efficacy. Evaluation metrics such as precision, recall, and the F1 -score are employed to assess the effectiveness of the models. The research findings reveal that RNN surpasses CNN in Urdu sentiment analysis, gaining a significantly higher accuracy rate of 91%. This result accentuates the exceptional performance of RNN, solidifying its status as a compelling option for conducting sentiment analysis tasks in the Urdu language.

引用

页码：713 / 733

页数：21

共 50 条

[1] Resource Construction and Ensemble Learning Based Sentiment Analysis for the Low-resource Language Uyghur
Yusup, Azragul
Chen, Degang
Ge, Yifei
Mao, Hongliang
Wang, Nujian
JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (04): : 1009 - 1016
[2] Low-Resource Language Processing Using Improved Deep Learning with Hunter-Prey Optimization Algorithm
Al-Wesabi, Fahd N.
Alshahrani, Hala J.
Osman, Azza Elneil
Abd Elhameed, Elmouez Samir
MATHEMATICS, 2023, 11 (21)
[3] Sentiment analysis on a low-resource language dataset using multimodal representation learning and cross-lingual transfer learning
Gladys, A. Aruna
Vetriselvi, V.
APPLIED SOFT COMPUTING, 2024, 157
[4] A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages
Kastrati, Zenun
Ahmedi, Lule
Kurti, Arianit
Kadriu, Fatbardh
Murtezaj, Doruntina
Gashi, Fatbardh
ELECTRONICS, 2021, 10 (10)
[5] Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques
Thakkar, Gaurish
Preradovic, Nives Mikelic
Tadic, Marko
ENG, 2024, 5 (04): : 2920 - 2942
[6] Low-resource Sinhala Speech Recognition using Deep Learning
Karunathilaka, Hirunika
Welgama, Viraj
Nadungodage, Thilini
Weerasinghe, Ruvan
2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 196 - 201
[7] Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Koto, Fajri
Beck, Tilman
Talat, Zeerak
Gurevych, Iryna
Baldwin, Timothy
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 298 - 320
[8] Deep Persian sentiment analysis: Cross-lingual training for low-resource languages
Ghasemi, Rouzbeh
Ashrafi Asli, Seyed Arad
Momtazi, Saeedeh
JOURNAL OF INFORMATION SCIENCE, 2022, 48 (04) : 449 - 462
[9] Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages
Roy, Pradeep Kumar
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
[10] LOW-RESOURCE LANGUAGE IDENTIFICATION FROM SPEECH USING TRANSFER LEARNING
Feng, Kexin
Chaspari, Theodora
2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,

← 1 2 3 4 5 →