Object-aware semantics of attention for image captioning

被引:11
|
作者
Wang, Shiwei [1 ,3 ]
Lan, Long [2 ,3 ]
Zhang, Xiang [2 ,3 ]
Dong, Guohua [1 ,3 ]
Luo, Zhigang [1 ,3 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
High-level semantic concepts; Semantic attention; Image captioning;
D O I
10.1007/s11042-019-08209-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.
引用
收藏
页码:2013 / 2030
页数:18
相关论文
共 50 条
  • [21] Attention on Attention for Image Captioning
    Huang, Lun
    Wang, Wenmin
    Chen, Jie
    Wei, Xiao-Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
  • [22] Object-aware Dense Semantic Correspondence
    Yang, Fan
    Li, Xin
    Cheng, Hong
    Li, Jianping
    Chen, Leiting
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4151 - 4159
  • [23] OCTET: Object-aware Counterfactual Explanations
    Zemni, Mehdi
    Chen, Mickael
    Zablocki, Eloi
    Ben-Younes, Hedi
    Perez, Patrick
    Cord, Matthieu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15062 - 15071
  • [24] Towards Object-Aware Development Tools
    Chis, Andrei
    COMPANION PROCEEDINGS OF THE 2016 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES AND APPLICATIONS: SOFTWARE FOR HUMANITY (SPLASH COMPANION'16), 2016, : 65 - 66
  • [25] Multi-Level Object-Aware Guidance Network for Biomedical Image Segmentation
    Wu, Huisi
    Zhang, Baiming
    Pan, Junquan
    Qin, Jing
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (03) : 2440 - 2453
  • [26] Object-Aware Adaptive Convolution Kernel Attention Mechanism in Siamese Network for Visual Tracking
    Yuan, Dongliang
    Li, Qingdang
    Yang, Xiaohui
    Zhang, Mingyue
    Sun, Zhen
    APPLIED SCIENCES-BASEL, 2022, 12 (02):
  • [27] Object-Aware Anchor-Free Video Object Tracking using Attention Mechanism and Target Dynamics
    Boroujeni, Ali Abbasi
    Toroghi, Rahil Mahdian
    Zareian, Hassan
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 561 - 565
  • [28] Learning visual relationship and context-aware attention for image captioning
    Wang, Junbo
    Wang, Wei
    Wang, Liang
    Wang, Zhiyong
    Feng, David Dagan
    Tan, Tieniu
    PATTERN RECOGNITION, 2020, 98
  • [29] Leveraging Linguistically-aware Object Relations and NASNet for Image Captioning
    Sharif, Naeha
    Jalwana, Mohammad A. A. K.
    Bennamoun, Mohammed
    Liu, Wei
    Shah, Syed Afaq Ali
    2020 35TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2020,
  • [30] A multi-view projection-based object-aware graph network for dense captioning of point clouds☆
    Ma, Zijing
    Yang, Zhi
    Mao, Aihua
    Wen, Shuyi
    Yi, Ran
    Liu, Yongjin
    COMPUTERS & GRAPHICS-UK, 2025, 126