Sentiment Analysis of Multilingual Dataset of Bahraini Dialects, Arabic, and English

被引:1
|
作者
Omran, Thuraya [1 ]
Sharef, Baraa [2 ]
Grosan, Crina [3 ]
Li, Yongmin [1 ]
机构
[1] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
[2] Ahlia Univ, Coll Informat Technol, Dept Informat Technol, POB 10878, Manama, Bahrain
[3] Kings Coll London, Div Appl Technol Clin Care, London WC2R 2LS, England
关键词
Bahraini dialects resources; Bahraini resources scarcity; deep learning; products reviews;
D O I
10.3390/data8040068
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis is an application of natural language processing (NLP) that requires a machine learning algorithm and a dataset. In some cases, the dataset availability is scarce, particularly with Arabic dialects, precisely the Bahraini ones, which necessitates using an approach such as translation, where a rich source language is exploited to create the target language dataset. In this study, a dataset of Amazon product reviews in Bahraini dialects is presented. This dataset was generated using two cascading stages of translation-a machine translation followed by a manual one. Machine translation was applied using Google Translate to translate English Amazon product reviews into Standard Arabic. In contrast, the manual approach was applied to translate the resulting Arabic reviews into Bahraini ones by qualified native speakers utilizing constructed customized forms. The resulting parallel dataset of English, Standard Arabic, and Bahraini dialects is called English_Modern Standard Arabic_Bahraini Dialects product reviews for sentiment analysis "E_MSA_BDs-PR-SA". The dataset is balanced, composed of 2500 positive and 2500 negative reviews. The sentiment analysis process was implemented using a stacked LSTM deep learning model. The Bahraini dialect product dataset can be utilized in the transfer learning process for sentimentally analyzing another dataset in Bahraini dialects. Dataset: https://doi.org/10.17632/5rhw2srzjj.1 Dataset License: CC-BY-NC
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Gated recurrent unit with multilingual universal sentence encoder for Arabic aspect-based sentiment analysis
    AL-Smadi, Mohammad
    Hammad, Mahmoud M.
    Al-Zboon, Sa'ad A.
    AL-Tawalbeh, Saja
    Cambria, Erik
    KNOWLEDGE-BASED SYSTEMS, 2023, 261
  • [32] Supervised sentiment analysis in multilingual environments
    Vilares, David
    Alonso, Miguel A.
    Gomez-Rodriguez, Carlos
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (03) : 595 - 607
  • [33] Multilingual aspect clustering for sentiment analysis
    Costella Pessutto, Lucas Rafael
    Vargas, Danny Suarez
    Moreira, Viviane P.
    KNOWLEDGE-BASED SYSTEMS, 2020, 192
  • [34] Multilingual Sentiment Analysis for a Swiss Gig
    Pustulka-Hunt, Ela
    Hanne, Thomas
    Blumer, Eliane
    Frieder, Manuel
    2018 6TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI 2018), 2018, : 94 - 98
  • [35] Using SentiWordNet for multilingual sentiment analysis
    Denecke, Kerstin
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 427 - 432
  • [36] SATALex: Telecom Domain-specific Sentiment Lexicons for Egyptian and Gulf Arabic Dialects
    Shoukry, Amira
    Rafea, Ahmed
    WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, : 169 - 176
  • [37] Arabic or English? Multilingual users' preferences in Dubai ATM transactions
    Al-Issa, Ahmad
    Sulieman, Hana
    FRONTIERS IN COMMUNICATION, 2024, 9
  • [38] NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
    Muhammad, Shamsuddeen Hassan
    Adelani, David Ifeoluwa
    Ruder, Sebastian
    Ahmad, Ibrahim Sa'id
    Abdulmumin, Idris
    Bello, Bello Shehu
    Choudhury, Monojit
    Emezue, Chris Chinenye
    Abdullahi, Saheed Salahudeen
    Aremu, Anuoluwapo
    Jorge, Alipio
    Brazdil, Pavel
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 590 - 602
  • [39] LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset
    Elshakankery, Kariman
    Fayek, Magda
    Farouk, Mona
    5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 62 - 66
  • [40] Transformer based multilingual joint learning framework for code-mixed and english sentiment analysis
    Asif Mamta
    Journal of Intelligent Information Systems, 2024, 62 (1) : 231 - 253