AraXLNet: pre-trained language model for sentiment analysis of Arabic

被引：0

作者：

Alhanouf Alduailej

Abdulrahman Alothaim

机构：

[1] King Saud University,Department of Information Systems, College of Computer and Information Sciences

来源：

Journal of Big Data | / 9卷

关键词：

Sentiment analysis; Language models; NLP; XLNet; AraXLNet; Text mining;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.

引用

共 50 条

[41] Chinese Fine-Grained Sentiment Classification Based on Pre-trained Language Model and Attention Mechanism
Zhou, Faguo
Zhang, Jing
Song, Yanan
SMART COMPUTING AND COMMUNICATION, 2022, 13202 : 37 - 47
[42] Context Analysis for Pre-trained Masked Language Models
Lai, Yi-An
Lalwani, Garima
Zhang, Yi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3789 - 3804
[43] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
Park, Jeonghyeok
Zhao, Hai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
[44] BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives
Souza, Frederico Dias
de Oliveira e Souza Filho, Joao Baptista
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 209 - 218
[45] Roman Urdu Sentiment Analysis Using Pre-trained DistilBERT and XLNet
Azhar, Nikhar
Latif, Seemab
2022 FIFTH INTERNATIONAL CONFERENCE OF WOMEN IN DATA SCIENCE AT PRINCE SULTAN UNIVERSITY (WIDS-PSU 2022), 2022, : 75 - 78
[46] LETS: A Label-Efficient Training Scheme for Aspect-Based Sentiment Analysis by Using a Pre-Trained Language Model
Shim, Heereen
Lowet, Dietwig
Luca, Stijn
Vanrumste, Bart
IEEE ACCESS, 2021, 9 : 115563 - 115578
[47] An Enhanced Sentiment Analysis Framework Based on Pre-Trained Word Embedding
Mohamed, Ensaf Hussein
Moussa, Mohammed ElSaid
Haggag, Mohamed Hassan
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2020, 19 (04)
[48] Aspect Based Sentiment Analysis using French Pre-Trained Models
Essebbar, Abderrahman
Kane, Bamba
Guinaudeau, Ophelie
Chiesa, Valeria
Quenel, Ilhem
Chau, Stephane
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2021, : 519 - 525
[49] SsciBERT: a pre-trained language model for social science texts
Si Shen
Jiangfeng Liu
Litao Lin
Ying Huang
Lin Zhang
Chang Liu
Yutong Feng
Dongbo Wang
Scientometrics, 2023, 128 : 1241 - 1263
[50] A Pre-trained Clinical Language Model for Acute Kidney Injury
Mao, Chengsheng
Yao, Liang
Luo, Yuan
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 531 - 532

← 1 2 3 4 5 →