Arabic ChatGPT Tweets Classification Using RoBERTa and BERT Ensemble Model

被引：14

作者：

mujahid, Muhammad ^{[1
]}

Kanwal, Khadija ^{[2
]}

Rustam, Furqan ^{[3
]}

Aljadani, Wajdi ^{[4
]}

Ashraf, Imran ^{[5
]}

机构：

[1] Khwaja Fareed Univ Engn & Information Technol, Dept Comp Sci, Rahim Yar Khan, Pakistan

[2] Women Univ Multan, Inst CS & IT, Multan 6600, Pakistan

[3] Univ Coll Dublin, Sch Comp Sci, Dublin D04 V1W8, Ireland

[4] Univ North Texas, Dept Comp Sci & Engn, Denton, TX USA

[5] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea

来源：

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING | 2023年 / 22卷 / 08期

关键词：

Arabic tweets; low-resource language; ChatGPT; OpenAI; transformer models; BERT; sentiment analysis; SENTIMENT ANALYSIS; IMPACT;

D O I：

10.1145/3605889

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

ChatGPT OpenAI, a large-language chatbot model, has gained a lot of attention due to its popularity and impressive performance in many natural language processing tasks. ChatGPT produces superior answers to a wide range of real-world human questions and generates human-like text. The new OpenAI ChatGPT technology may have some strengths and weaknesses at this early stage. Users have reported early opinions about the ChatGPT features, and their feedback is essential to recognize and fix its shortcomings and issues. This study uses the ChatGPT tweets Arabic dataset to automatically find user opinions and sentiments about ChatGPT technology. The dataset is preprocessed and labeled using the TextBlob Arabic Python library into positive, negative, and neutral tweets. Despite extensive works for the English language, languages like Arabic are less studied regarding tweet analysis. Existing literature about Arabic tweet sentiment analysis has mainly focused on machine learning and deep learning models. We collected a total of 27,780 unstructured tweets from Twitter using the Tweepy SNscrape Python library using various hash-tags such as # Chat-GPT, #OpenAI, #Chatbot, Chat-GPT3, and so on. To enhance the model's performance and reduce computational complexity, unstructured tweets are converted into structured and normalized forms. Tweets contain missing values, URL and HTML tags, stop words, punctuation, diacritics, elongations, and numeric values that have no impact on the model performance; hence, these increase the computational cost. So, these steps are removed with the help of Python preprocessing libraries to enhance text quality and consistency. This study adopts Transformer-based models such as RoBERTa, XLNet, and DistilBERT that automatically classify the tweets. Additionally, a hybrid transformer-based model is proposed to obtain better results. The proposed hybrid model is developed by combining the hidden outputs of the RoBERTA and BERT models using a concatenation layer, then adding dense layers with "Relu" activation employed as a hidden layer to create non-linearity and a "softmax" activation function for multiclass classification. They differ from existing state-of-the-art models due to the enhanced capabilities of both models in text classification. Hybrid models combine the different models to make accurate predictions and reduce bias and enhanced the overall results, while state-of-the-art models are incapable of making accurate predictions. Experiments show that the proposed hybrid model achieves 96.02% accuracy, 100% precision on negative tweets, and 99% recall for neutral tweets. The performance of the proposed model is far better than existing state-of-the-art models.

引用

页数：23

共 50 条

[1] An ensemble model for classifying idioms and literal texts using BERT and RoBERTa
Briskilal, J.
Subalalitha, C. N.
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
[2] EMOTION DETECTION FROM TWEETS USING A BERT AND SVM ENSEMBLE MODEL
ALBU, Ionut-Alexandru
SPINU, Stelian
UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2022, 84 (01): : 63 - 74
[3] Personality Identification from Social Media Using Ensemble BERT and RoBERTa
Tsani E.F.
Suhartono D.
Informatica (Slovenia), 2023, 47 (04): : 537 - 544
[4] Arabic spam tweets classification using deep learning
Sanaa Kaddoura
Suja A. Alex
Maher Itani
Safaa Henno
Asma AlNashash
D. Jude Hemanth
Neural Computing and Applications, 2023, 35 : 17233 - 17246
[5] Classification of Arabic Tweets: A Review
Alruily, Meshrif
ELECTRONICS, 2021, 10 (10)
[6] Arabic spam tweets classification using deep learning
Kaddoura, Sanaa
Alex, Suja A.
Itani, Maher
Henno, Safaa
AlNashash, Asma
Hemanth, D. Jude
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23): : 17233 - 17246
[7] Arabic aspect sentiment polarity classification using BERT
Mohammed M. Abdelgwad
Taysir Hassan A. Soliman
Ahmed I. Taloba
Journal of Big Data, 9
[8] Arabic aspect sentiment polarity classification using BERT
Abdelgwad, Mohammed M.
Soliman, Taysir Hassan A.
Taloba, Ahmed I.
JOURNAL OF BIG DATA, 2022, 9 (01)
[9] Arabic Sentiment Analysis Using BERT Model
Chouikhi, Hasna
Chniter, Hamza
Jarray, Fethi
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 621 - 632
[10] Virality Prediction for News Tweets Using RoBERTa
Maldonado-Sifuentes, Christian E.
Angel, Jason
Sidorov, Grigori
Kolesnikova, Olga
Gelbukh, Alexander
ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 81 - 95

← 1 2 3 4 5 →