Object-aware semantics of attention for image captioning

被引:11
|
作者
Wang, Shiwei [1 ,3 ]
Lan, Long [2 ,3 ]
Zhang, Xiang [2 ,3 ]
Dong, Guohua [1 ,3 ]
Luo, Zhigang [1 ,3 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
High-level semantic concepts; Semantic attention; Image captioning;
D O I
10.1007/s11042-019-08209-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.
引用
收藏
页码:2013 / 2030
页数:18
相关论文
共 50 条
  • [1] Object-aware semantics of attention for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Guohua Dong
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
  • [2] Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
    Zhang, Junchao
    Peng, Yuxin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8319 - 8328
  • [3] Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation
    Zhang, Junchao
    Peng, Yuxin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6209 - 6222
  • [4] Object-aware Image Compression with Adversarial Learning
    Du, Yunfei
    Zhao, Nan
    Duan, Yiping
    Han, Chaoyi
    2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,
  • [5] Object-Aware Attention in Few-Shot Learning
    Shen, Yeqing
    Mo, Lisha
    Ma, Huimin
    Hu, Tianyu
    Dong, Yuhan
    IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 95 - 108
  • [6] Object-aware Deep Network for Commodity Image Retrieval
    Fang, Zhiwei
    Liu, Jing
    Wang, Yuhang
    Li, Yong
    Song, Hang
    Tang, Jinhui
    Lui, Hanqing
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 405 - 408
  • [7] Object Relation Attention for Image Paragraph Captioning
    Yang, Li-Chuan
    Yang, Chih-Yuan
    Hsu, Jane Yung-jen
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
  • [8] Object-Aware Tracking
    Bogun, Ivan
    Ribeiro, Eraldo
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1695 - 1700
  • [9] Object-aware Identification of Microservices
    Amiri, Mohammad Javad
    2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 253 - 256
  • [10] Image Inpainting with Cascaded Modulation GAN and Object-Aware Training
    Zheng, Haitian
    Lin, Zhe
    Lu, Jingwan
    Cohen, Scott
    Shechtman, Eli
    Barnes, Connelly
    Zhang, Jianming
    Xu, Ning
    Amirghodsi, Sohrab
    Luo, Jiebo
    COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 277 - 296