Fine-grained attention for image caption generation

被引:0
|
作者
Yan-Shuo Chang
机构
[1] China(Xi’an) Institute for Silk Road Research,School of Information
[2] Xi’an University of Finance and Economics,undefined
来源
关键词
Fine-grained attention; Image caption generation; Attention generation;
D O I
暂无
中图分类号
学科分类号
摘要
Despite the progress, generating natural language descriptions for images is still a challenging task. Most state-of-the-art methods for solving this problem apply existing deep convolutional neural network (CNN) models to extract a visual representation of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks. However, there is an inherent drawback that their models may attend to a partial view of a visual element or a conglomeration of several concepts. In this paper, we present a fine-grained attention based model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation. The model contains three sub-networks: a deep recurrent neural network for sentences, a deep convolutional network for images, and a region proposal network for nearly cost-free region proposals. Our model is able to automatically learn to fix its gaze on salient region proposals. The process of generating the next word, given the previously generated ones, is aligned with this visual perception experience. We validate the effectiveness of the proposed model on three benchmark datasets (Flickr 8K, Flickr 30K and MS COCO). The experimental results confirm the effectiveness of the proposed system.
引用
收藏
页码:2959 / 2971
页数:12
相关论文
共 50 条
  • [31] A Streamlined Attention Mechanism for Image Classification and Fine-Grained Visual Recognition
    Dakshayani Himabindu D.
    Praveen Kumar S.
    Dakshayani Himabindu, D. (dakshayanihimabindu_d@vnrvjiet.in), 1600, Brno University of Technology (27): : 59 - 67
  • [32] Text-guided Attention Mechanism Fine-grained Image Classification
    Yang, Xinglin
    Pan, Heng
    2022 THE 6TH INTERNATIONAL CONFERENCE ON VIRTUAL AND AUGMENTED REALITY SIMULATIONS, ICVARS 2022, 2022, : 45 - 49
  • [33] Local context attention learning for fine-grained scene graph generation
    Zhu, Xuhan
    Wang, Ruiping
    Lan, Xiangyuan
    Wang, Yaowei
    PATTERN RECOGNITION, 2024, 156
  • [34] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [35] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [36] Attention-Based Image Caption Generation
    Manasa, M.
    Sowmya, D.
    Reddy, Y. Supriya
    Sreedevi, Pogula
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE, MACHINE LEARNING AND APPLICATIONS, VOL 1, ICDSMLA 2023, 2025, 1273 : 364 - 369
  • [37] Fine-grained Recognition of Chinese Food Image Based on DenseNet with Attention Mechanism
    Hao, Ran
    Gao, Weidong
    Mi, Jihang
    Zhao, Zhenwei
    TWELFTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2020), 2021, 11720
  • [38] Mixed Attention Mechanism for Small-Sample Fine-grained Image Classification
    Li, Xiaoxu
    Wu, Jijie
    Chang, Dongliang
    Huang, Weifeng
    Ma, Zhanyu
    Cao, Jie
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 80 - 85
  • [39] Convolutional Attention Network with Maximizing Mutual Information for Fine-Grained Image Classification
    Wang, Fenglei
    Zhou, Hao
    Li, Shuohao
    Lei, Jun
    Zhang, Jun
    SYMMETRY-BASEL, 2020, 12 (09):
  • [40] Fine-Grained Medical Image Synthesis with Dual-Attention Adversarial Learning
    Xiao, Qiuyu
    Nie, Dong
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 298 - 306