Improved Image Captioning Using GAN and ViT

被引:0
|
作者
Rao, Vrushank D. [1 ]
Shashank, B. N. [1 ]
Bhattu, S. Nagesh [1 ]
机构
[1] Natl Inst Technol Andhra Pradesh, Dept Comp Sci & Engn, Tadepalligudem, India
关键词
Vision Transformers; Data2Vec; Image Captioning;
D O I
10.1007/978-3-031-58535-7_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Encoder-decoder architectures are widely used in solving image captioning applications. Convolutional encoders and recurrent decoders are prominently used for such applications. Recent advances in transformer-based designs have made SOTA performances in solving various language and vision tasks. This work inspects the research question of using transformer-based encoder and decoder in building an effective pipeline for image captioning. An adversarial objective using a Generative Adversarial Network is used to improve the diversity of the captions generated. The generator component of our model utilizes a ViT encoder and a transformer decoder to generate semantically meaningful captions for a given image. To enhance the quality and authenticity of the generated captions, we introduce a discriminator component built using a transformer decoder. The discriminator evaluates the captions by considering both the image and the caption generated by the generator. By training this architecture, we aim to ensure that the generator produces captions that are indistinguishable from real captions, increasing the overall quality of the generated outputs. Through extensive experimentation, we demonstrate the effectiveness of our approach in generating diverse and contextually appropriate captions for various images. We evaluate our model on benchmark datasets and compare its performance against existing state-of-the-art image captioning methods. The proposed approach has achieved superior results compared to previous methods, as demonstrated by improved caption accuracy metrics such as BLEU-3, BLEU-4, and other relevant accuracy measures.
引用
收藏
页码:375 / 385
页数:11
相关论文
共 50 条
  • [1] Improved GAN for image resolution enhancement using ViT for breast cancer detection
    Rautela, Kamakshi
    Kumar, Dinesh
    Kumar, Vijay
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
  • [2] ViT - Inception - GAN for Image Colourisation
    Bana, Tejas
    Loya, Jatan
    Kulkarni, Siddhant
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE (LOD 2021), PT I, 2022, 13163 : 105 - 118
  • [3] Text to Image Synthesis for Improved Image Captioning
    Hossain, Md. Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    IEEE ACCESS, 2021, 9 : 64918 - 64928
  • [4] Image captioning improved visual question answering
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
  • [5] CgT-GAN: CLIP-guided Text GAN for Image Captioning
    Yu, Jiarui
    Li, Haoran
    Hao, Yanbin
    Zhu, Bin
    Xu, Tong
    He, Xiangnan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
  • [6] Improved Transformer with Parallel Encoders for Image Captioning
    Lou, Liangshan
    Lu, Ke
    Xue, Jian
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4072 - 4078
  • [7] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [8] Caps Captioning: A Modern Image Captioning Approach Based on Improved Capsule Network
    Javanmardi, Shima
    Latif, Ali Mohammad
    Sadeghi, Mohammad Taghi
    Jahanbanifard, Mehrdad
    Bonsangue, Marcello
    Verbeek, Fons J.
    SENSORS, 2022, 22 (21)
  • [9] Memorial GAN With Joint Semantic Optimization for Unpaired Image Captioning
    Song, Peipei
    Guo, Dan
    Zhou, Jinxing
    Xu, Mingliang
    Wang, Meng
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (07) : 4388 - 4399
  • [10] Improving image captioning with Pyramid Attention and SC-GAN
    Chen, Tianyu
    Li, Zhixin
    Wu, Jingli
    Ma, Huifang
    Su, Bianping
    IMAGE AND VISION COMPUTING, 2022, 117