Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English

被引:1
|
作者
Omran, Thuraya [1 ]
Sharef, Baraa [2 ]
Grosan, Crina [3 ]
Li, Yongmin [1 ]
机构
[1] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
[2] Ahlia Univ, Coll Informat Technol, Dept Informat Technol, POB 10878, Manama, Bahrain
[3] Kings Coll London, Div Appl Technol Clin Care, London WC2R 2LS, England
关键词
Bahraini dialects resources; Bahraini resources scarcity; deep learning; products reviews;
D O I
10.3390/data8040068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is scarce, particularly with Arabic dialects, precisely the Bahraini ones, which necessitates using an approach such as translation, where a rich source language is exploited to create the target language dataset. In this study, a dataset of Amazon product reviews in Bahraini dialects is presented. This dataset was generated using two cascading stages of translation-a machine translation followed by a manual one. Machine translation was applied using Google Translate to translate English Amazon product reviews into Standard Arabic. In contrast, the manual approach was applied to translate the resulting Arabic reviews into Bahraini ones by qualified native speakers utilizing constructed customized forms. The resulting parallel dataset of English, Standard Arabic, and Bahraini dialects is called English_Modern Standard Arabic_Bahraini Dialects product reviews for sentiment analysis "E_MSA_BDs-PR-SA". The dataset is balanced, composed of 2500 positive and 2500 negative reviews. The sentiment analysis process was implemented using a stacked LSTM deep learning model. The Bahraini dialect product dataset can be utilized in the transfer learning process for sentimentally analyzing another dataset in Bahraini dialects. Dataset: https://doi.org/10.17632/5rhw2srzjj.1 Dataset License: CC-BY-NC
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Transfer learning and sentiment analysis of Bahraini dialects sequential text data using multilingual deep learning approach
    Omran, Thuraya M.
    Sharef, Baraa T.
    Grosan, Crina
    Li, Yongmin
    DATA & KNOWLEDGE ENGINEERING, 2023, 143
  • [2] Social Data Sentiment Analysis of a Multilingual Dataset: A Case Study with Malayalam and English
    Mathews, Deepa Mary
    Abraham, Sajimon
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, PT I, 2019, 1075 : 70 - 78
  • [3] Mapping the Multilingual Margins: Intersectional Biases of Sentiment Analysis Systems in English, Spanish, and Arabic
    Camara, Antonio
    Taneja, Nina
    Azad, Tamjeed
    Allaway, Emily
    Zemel, Richard
    PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 90 - 106
  • [4] #JamalKhashoggi: Unraveling multilingual Twitter sentiment dynamics in a longitudinal comparative analysis of tweets in Arabic and English
    Zeid, Nour
    Frissen, Thomas
    Scherr, Sebastian
    NEW MEDIA & SOCIETY, 2024, 26 (09) : 5529 - 5553
  • [5] Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects
    Al Shamsi, Arwa A.
    Abdallah, Sherief
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
  • [6] Comparative Evaluation of Sentiment Analysis Methods Across Arabic Dialects
    Baly, Ramy
    El-Khoury, Georges
    Moukalled, Rawan
    Aoun, Rita
    Hajj, Hazem
    Shaban, Khaled Bashir
    El-Hajj, Wassim
    ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 : 266 - 273
  • [7] The Corpus Based Approach to Sentiment Analysis in Modern Standard Arabic and Arabic Dialects: A Literature Review
    Alnawas, Anwar
    Arici, Nursal
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2018, 21 (02): : 461 - 470
  • [8] Using Tweets and Emojis to Build TEAD: an Arabic Dataset for Sentiment Analysis
    Abdellaoui, Houssem
    Zrigui, Mounir
    COMPUTACION Y SISTEMAS, 2018, 22 (03): : 777 - 786
  • [9] Sentiment Analysis in Maghrebi Arabic Dialects with Enhanced BERT Models and Big Data Processing
    Taha, Marbouh
    Halima, Outada
    Abdelaziz, Chetouani
    Omayma, Mahmoudi
    Naoufal, El Allali
    DIGITAL TECHNOLOGIES AND APPLICATIONS, ICDTA 2024, VOL 4, 2024, 1101 : 13 - 22
  • [10] Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects
    Mulki, Hala
    Haddad, Hatem
    Gridach, Mourad
    Babaoglu, Ismail
    FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), 2019, : 30 - 39