A performance analysis of transformer-based deep learning models for Arabic image captioning

被引:1
|
作者
Alsayed, Ashwaq [1 ]
Qadah, Thamir M. [1 ]
Arif, Muhammad [1 ]
机构
[1] Umm Al Qura Univ, Coll Comp & Informat Syst, Comp Sci Dept, Mecca, Saudi Arabia
关键词
Image captioning; Arabic image captioning; Transformer model; Performance analysis and evaluation; Deep learning; Machine learning; Arabic technologies;
D O I
10.1016/j.jksuci.2023.101750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning has become a fundamental operation that allows the automatic generation of text descriptions of images. However, most existing work focused on performing the image captioning task in English, and only a few proposals exist that address the image captioning task in Arabic. This paper focuses on understanding the factors that affect the performance of machine learning models performing Arabic image captioning (AIC). In particular, we focus on transformer-based models for AIC and study the impact of various text-preprocessing methods: CAMeL Tools, ArabertPreprocessor, and Stanza. Our study shows that using CAMeL Tools to preprocess text labels improves the AIC performance by up to 34-92% in the BLEU-4 score. In addition, we study the impact of image recognition models. Our results show that ResNet152 is better than EfficientNet-B0 and can improve BLEU scores performance by 9-11%. Furthermore, we investigate the impact of different datasets on the overall AIC performance and build an extended version of the Arabic Flickr8k dataset. Using the extended version improves the BLEU-4 score of the AIC model by up to 148%. Finally, utilizing our results, we build a model that significantly outperforms the state-of-the-art proposals in AIC by up to 196-379% in the BLUE-4 score. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Efficient Image Captioning Based on Vision Transformer Models
    Elbedwehy, Samar
    Medhat, T.
    Hamza, Taher
    Alrahmawy, Mohammed F.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 1483 - 1500
  • [22] Performance Comparison of Vision Transformer-Based Models in Medical Image Classification
    Kanca, Elif
    Ayas, Selen
    Kablan, Elif Baykal
    Ekinci, Murat
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [23] Aware-Transformer: A Novel Pure Transformer-Based Model for Remote Sensing Image Captioning
    Cao, Yukun
    Yan, Jialuo
    Tang, Yijia
    He, Zhenyi
    Xu, Kangle
    Cheng, Yu
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT I, 2024, 14495 : 105 - 117
  • [24] Transformer-Based Deep Survival Analysis
    Hu, Shi
    Fridgeirsson, Egill A.
    van Wingen, Guido
    Welling, Max
    SURVIVAL PREDICTION - ALGORITHMS, CHALLENGES AND APPLICATIONS, VOL 146, 2021, 146 : 132 - 148
  • [25] An ensemble transformer-based model for Arabic sentiment analysis
    Mohamed, Omar
    Kassem, Aly M. M.
    Ashraf, Ali
    Jamal, Salma
    Mohamed, Ensaf Hussein
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 13 (01)
  • [26] CRAT: Advanced transformer-based deep learning algorithms in OCT image classification
    Yang, Mingming
    Du, Junhui
    Lv, Ruichan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [27] Enhancing Image Captioning with Transformer-Based Two-Pass Decoding Framework
    Su, Jindian
    Mou, Yueqi
    Xie, Yunhao
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 171 - 183
  • [28] An ensemble transformer-based model for Arabic sentiment analysis
    Omar Mohamed
    Aly M. Kassem
    Ali Ashraf
    Salma Jamal
    Ensaf Hussein Mohamed
    Social Network Analysis and Mining, 13
  • [29] Exploring Visual Relationships via Transformer-based Graphs for Enhanced Image Captioning
    Li, Jingyu
    Mao, Zhendong
    Li, Hao
    Chen, Weidong
    Zhang, Yongdong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [30] Improving scene text image captioning using transformer-based multilevel attention
    Srivastava, Swati
    Sharma, Himanshu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)