Improving image captioning with Pyramid Attention and SC-GAN

被引:23
|
作者
Chen, Tianyu [1 ]
Li, Zhixin [1 ]
Wu, Jingli [1 ]
Ma, Huifang [2 ]
Su, Bianping [3 ]
机构
[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China
[2] Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou 730070, Peoples R China
[3] Xian Univ Architecture & Technol, Coll Sci, Xian 710055, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Pyramid Attention network; Self-critical training; Reinforcement learning; Generative adversarial network; Sequence-level learning;
D O I
10.1016/j.imavis.2021.104340
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the existing image captioning models mainly use global attention, which represents the whole image fea-tures, local attention, representing the object features, or a combination of them; there are few models to inte-grate the relationship information between various object regions of the image. But this relationship information is also very instructive for caption generation. For example, if a football appears, there is a high prob-ability that the image also contains people near the football. In this article, the relationship feature is embedded into the global-local attention to constructing a new Pyramid Attention mechanism, which can explore the inter-nal visual and semantic relationship between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we propose a new method to apply the Generative Adver-sarial Network into sequence generation. The greedy decoding method is used to generate an efficient baseline reward for self-critical training. Finally, experiments on MSCOCO dataset show that the model can generate more accurate and vivid captions and outperforms many recent advanced models in various prevailing evalua-tion metrics on both local and online test sets.(c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Dual Attention on Pyramid Feature Maps for Image Captioning
    Yu, Litao
    Zhang, Jian
    Wu, Qiang
    IEEE Transactions on Multimedia, 2022, 24 : 1775 - 1786
  • [2] Dual Attention on Pyramid Feature Maps for Image Captioning
    Yu, Litao
    Zhang, Jian
    Wu, Qiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1775 - 1786
  • [3] SC-GAN: Subspace Clustering based GAN for Automatic Expression Manipulation
    Li, Shuai
    Liu, Liang
    Liu, Ji
    Song, Wenfeng
    Hao, Aimin
    Qin, Hong
    PATTERN RECOGNITION, 2023, 134
  • [4] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [5] SPT: Spatial Pyramid Transformer for Image Captioning
    Zhang, Haonan
    Zeng, Pengpeng
    Gao, Lianli
    Lyu, Xinyu
    Song, Jingkuan
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4829 - 4842
  • [6] Areas of Attention for Image Captioning
    Pedersoli, Marco
    Lucas, Thomas
    Schmid, Cordelia
    Verbeek, Jakob
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1251 - 1259
  • [7] Image Captioning with Semantic Attention
    You, Quanzeng
    Jin, Hailin
    Wang, Zhaowen
    Fang, Chen
    Luo, Jiebo
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4651 - 4659
  • [8] SC-GAN: STRUCTURE CONSISTENT GAN FOR MODALITY TRANSFER WITH FFT AND MULTI-SCALE PERCEPTION
    Xi, Ruiling
    Zhang, Yinglin
    Bai, Ruibin
    Higashita, Risa
    Liu, Jiang
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [9] Lightweight dynamic conditional GAN with pyramid attention for text-to-image synthesis
    Gao, Lianli
    Chen, Daiyuan
    Zhao, Zhou
    Shao, Jie
    Shen, Heng Tao
    PATTERN RECOGNITION, 2021, 110
  • [10] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,