Fine-grained attention for image caption generation

被引:0
|
作者
Yan-Shuo Chang
机构
[1] China(Xi’an) Institute for Silk Road Research,School of Information
[2] Xi’an University of Finance and Economics,undefined
来源
关键词
Fine-grained attention; Image caption generation; Attention generation;
D O I
暂无
中图分类号
学科分类号
摘要
Despite the progress, generating natural language descriptions for images is still a challenging task. Most state-of-the-art methods for solving this problem apply existing deep convolutional neural network (CNN) models to extract a visual representation of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks. However, there is an inherent drawback that their models may attend to a partial view of a visual element or a conglomeration of several concepts. In this paper, we present a fine-grained attention based model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation. The model contains three sub-networks: a deep recurrent neural network for sentences, a deep convolutional network for images, and a region proposal network for nearly cost-free region proposals. Our model is able to automatically learn to fix its gaze on salient region proposals. The process of generating the next word, given the previously generated ones, is aligned with this visual perception experience. We validate the effectiveness of the proposed model on three benchmark datasets (Flickr 8K, Flickr 30K and MS COCO). The experimental results confirm the effectiveness of the proposed system.
引用
收藏
页码:2959 / 2971
页数:12
相关论文
共 50 条
  • [21] Fine-Grained Image Search
    Xie, Lingxi
    Wang, Jingdong
    Zhang, Bo
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 636 - 647
  • [22] Variational Conditional GAN for Fine-grained Controllable Image Generation
    Hu, Mingqi
    Zhou, Deyu
    He, Yulan
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 109 - 124
  • [23] Improving the Conditional Fine-Grained Image Generation With Part Perception
    Han, Xuan
    You, Mingyu
    Lu, Ping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4792 - 4804
  • [24] Fine-grained image retrieval by combining attention mechanism and context information
    Li, Xiaoqing
    Ma, Jinwen
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1881 - 1897
  • [25] Fine-grained image retrieval by combining attention mechanism and context information
    Xiaoqing Li
    Jinwen Ma
    Neural Computing and Applications, 2023, 35 : 1881 - 1897
  • [26] Fine-Grained Image Classification Based on Cross-Attention Network
    Zheng, Zhiwen
    Zhou, Juxiang
    Gan, Jianhou
    Luo, Sen
    Gao, Wei
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
  • [27] Object-Part Attention Model for Fine-Grained Image Classification
    Peng, Yuxin
    He, Xiangteng
    Zhao, Junjie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1487 - 1500
  • [28] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [29] Fine-Grained Image Classification for Crop Disease Based on Attention Mechanism
    Yang, Guofeng
    He, Yong
    Yang, Yong
    Xu, Beibei
    FRONTIERS IN PLANT SCIENCE, 2020, 11
  • [30] Fine-grained image classification method based on hybrid attention module
    Lu, Weixiang
    Yang, Ying
    Yang, Lei
    FRONTIERS IN NEUROROBOTICS, 2024, 18