Arabic ChatGPT Tweets Classification Using RoBERTa and BERT Ensemble Model

被引:14
|
作者
mujahid, Muhammad [1 ]
Kanwal, Khadija [2 ]
Rustam, Furqan [3 ]
Aljadani, Wajdi [4 ]
Ashraf, Imran [5 ]
机构
[1] Khwaja Fareed Univ Engn & Information Technol, Dept Comp Sci, Rahim Yar Khan, Pakistan
[2] Women Univ Multan, Inst CS & IT, Multan 6600, Pakistan
[3] Univ Coll Dublin, Sch Comp Sci, Dublin D04 V1W8, Ireland
[4] Univ North Texas, Dept Comp Sci & Engn, Denton, TX USA
[5] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, South Korea
关键词
Arabic tweets; low-resource language; ChatGPT; OpenAI; transformer models; BERT; sentiment analysis; SENTIMENT ANALYSIS; IMPACT;
D O I
10.1145/3605889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
ChatGPT OpenAI, a large-language chatbot model, has gained a lot of attention due to its popularity and impressive performance in many natural language processing tasks. ChatGPT produces superior answers to a wide range of real-world human questions and generates human-like text. The new OpenAI ChatGPT technology may have some strengths and weaknesses at this early stage. Users have reported early opinions about the ChatGPT features, and their feedback is essential to recognize and fix its shortcomings and issues. This study uses the ChatGPT tweets Arabic dataset to automatically find user opinions and sentiments about ChatGPT technology. The dataset is preprocessed and labeled using the TextBlob Arabic Python library into positive, negative, and neutral tweets. Despite extensive works for the English language, languages like Arabic are less studied regarding tweet analysis. Existing literature about Arabic tweet sentiment analysis has mainly focused on machine learning and deep learning models. We collected a total of 27,780 unstructured tweets from Twitter using the Tweepy SNscrape Python library using various hash-tags such as # Chat-GPT, #OpenAI, #Chatbot, Chat-GPT3, and so on. To enhance the model's performance and reduce computational complexity, unstructured tweets are converted into structured and normalized forms. Tweets contain missing values, URL and HTML tags, stop words, punctuation, diacritics, elongations, and numeric values that have no impact on the model performance; hence, these increase the computational cost. So, these steps are removed with the help of Python preprocessing libraries to enhance text quality and consistency. This study adopts Transformer-based models such as RoBERTa, XLNet, and DistilBERT that automatically classify the tweets. Additionally, a hybrid transformer-based model is proposed to obtain better results. The proposed hybrid model is developed by combining the hidden outputs of the RoBERTA and BERT models using a concatenation layer, then adding dense layers with "Relu" activation employed as a hidden layer to create non-linearity and a "softmax" activation function for multiclass classification. They differ from existing state-of-the-art models due to the enhanced capabilities of both models in text classification. Hybrid models combine the different models to make accurate predictions and reduce bias and enhanced the overall results, while state-of-the-art models are incapable of making accurate predictions. Experiments show that the proposed hybrid model achieves 96.02% accuracy, 100% precision on negative tweets, and 99% recall for neutral tweets. The performance of the proposed model is far better than existing state-of-the-art models.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] An ensemble model for classifying idioms and literal texts using BERT and RoBERTa
    Briskilal, J.
    Subalalitha, C. N.
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
  • [2] EMOTION DETECTION FROM TWEETS USING A BERT AND SVM ENSEMBLE MODEL
    ALBU, Ionut-Alexandru
    SPINU, Stelian
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2022, 84 (01): : 63 - 74
  • [3] Personality Identification from Social Media Using Ensemble BERT and RoBERTa
    Tsani E.F.
    Suhartono D.
    Informatica (Slovenia), 2023, 47 (04): : 537 - 544
  • [4] Arabic spam tweets classification using deep learning
    Sanaa Kaddoura
    Suja A. Alex
    Maher Itani
    Safaa Henno
    Asma AlNashash
    D. Jude Hemanth
    Neural Computing and Applications, 2023, 35 : 17233 - 17246
  • [5] Classification of Arabic Tweets: A Review
    Alruily, Meshrif
    ELECTRONICS, 2021, 10 (10)
  • [6] Arabic spam tweets classification using deep learning
    Kaddoura, Sanaa
    Alex, Suja A.
    Itani, Maher
    Henno, Safaa
    AlNashash, Asma
    Hemanth, D. Jude
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23): : 17233 - 17246
  • [7] Arabic aspect sentiment polarity classification using BERT
    Mohammed M. Abdelgwad
    Taysir Hassan A. Soliman
    Ahmed I. Taloba
    Journal of Big Data, 9
  • [8] Arabic aspect sentiment polarity classification using BERT
    Abdelgwad, Mohammed M.
    Soliman, Taysir Hassan A.
    Taloba, Ahmed I.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [9] Arabic Sentiment Analysis Using BERT Model
    Chouikhi, Hasna
    Chniter, Hamza
    Jarray, Fethi
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 621 - 632
  • [10] Virality Prediction for News Tweets Using RoBERTa
    Maldonado-Sifuentes, Christian E.
    Angel, Jason
    Sidorov, Grigori
    Kolesnikova, Olga
    Gelbukh, Alexander
    ADVANCES IN SOFT COMPUTING (MICAI 2021), PT II, 2021, 13068 : 81 - 95