A performance analysis of transformer-based deep learning models for Arabic image captioning

被引:1
|
作者
Alsayed, Ashwaq [1 ]
Qadah, Thamir M. [1 ]
Arif, Muhammad [1 ]
机构
[1] Umm Al Qura Univ, Coll Comp & Informat Syst, Comp Sci Dept, Mecca, Saudi Arabia
关键词
Image captioning; Arabic image captioning; Transformer model; Performance analysis and evaluation; Deep learning; Machine learning; Arabic technologies;
D O I
10.1016/j.jksuci.2023.101750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34-92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9-11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196-379% in the BLUE-4 score. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models
    Alrashidi, Bedour
    Jamal, Amani
    Alkhathlan, Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [42] Improving Diacritical Arabic Speech Recognition: Transformer-Based Models with Transfer Learning and Hybrid Data Augmentation
    Alaqel, Haifa
    El Hindi, Khalil
    Information (Switzerland), 2025, 16 (03)
  • [43] Abusive Bangla comments detection on Facebook using transformer-based deep learning models
    Tanjim Taharat Aurpa
    Rifat Sadik
    Md Shoaib Ahmed
    Social Network Analysis and Mining, 2022, 12
  • [44] Transformer-Based Distillation Hash Learning for Image Retrieval
    Lv, Yuanhai
    Wang, Chongyan
    Yuan, Wanteng
    Qian, Xiaohao
    Yang, Wujun
    Zhao, Wanqing
    ELECTRONICS, 2022, 11 (18)
  • [45] Dementia Detection using Transformer-Based Deep Learning and Natural Language Processing Models
    Saltz, Ploypaphat
    Lin, Shih Yin
    Cheng, Sunny Chieh
    Si, Dong
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 509 - 510
  • [46] Abusive Bangla comments detection on Facebook using transformer-based deep learning models
    Aurpa, Tanjim Taharat
    Sadik, Rifat
    Ahmed, Md Shoaib
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [47] Astroconformer: The prospects of analysing stellar light curves with transformer-based deep learning models
    Pan, Jia-Shu
    Ting, Yuan-Sen
    Yu, Jie
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 528 (04) : 5890 - 5903
  • [48] Transformer-Based Federated Learning Models for Recommendation Systems
    Reddy, M. Sujaykumar
    Karnati, Hemanth
    Sundari, L. Mohana
    IEEE ACCESS, 2024, 12 : 109596 - 109607
  • [49] Deep learning-based solar image captioning
    Baek, Ji-Hye
    Kim, Sujin
    Choi, Seonghwan
    Park, Jongyeob
    Kim, Dongil
    ADVANCES IN SPACE RESEARCH, 2024, 73 (06) : 3270 - 3281
  • [50] Deep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations
    Hejazi, Hani
    Shaalan, Khaled
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (11) : 37 - 44