Object-aware semantics of attention for image captioning

被引：11

作者：

Wang, Shiwei ^{[1
,3
]}

Lan, Long ^{[2
,3
]}

Zhang, Xiang ^{[2
,3
]}

Dong, Guohua ^{[1
,3
]}

Luo, Zhigang ^{[1
,3
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China

[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China

[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2020年 / 79卷 / 3-4期

基金：

中国国家自然科学基金;

关键词：

High-level semantic concepts; Semantic attention; Image captioning;

D O I：

10.1007/s11042-019-08209-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.

引用

页码：2013 / 2030

页数：18

共 50 条

[21] Attention on Attention for Image Captioning
Huang, Lun
Wang, Wenmin
Chen, Jie
Wei, Xiao-Yong
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4633 - 4642
[22] Object-aware Dense Semantic Correspondence
Yang, Fan
Li, Xin
Cheng, Hong
Li, Jianping
Chen, Leiting
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4151 - 4159
[23] OCTET: Object-aware Counterfactual Explanations
Zemni, Mehdi
Chen, Mickael
Zablocki, Eloi
Ben-Younes, Hedi
Perez, Patrick
Cord, Matthieu
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15062 - 15071
[24] Towards Object-Aware Development Tools
Chis, Andrei
COMPANION PROCEEDINGS OF THE 2016 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES AND APPLICATIONS: SOFTWARE FOR HUMANITY (SPLASH COMPANION'16), 2016, : 65 - 66
[25] Multi-Level Object-Aware Guidance Network for Biomedical Image Segmentation
Wu, Huisi
Zhang, Baiming
Pan, Junquan
Qin, Jing
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (03) : 2440 - 2453
[26] Object-Aware Adaptive Convolution Kernel Attention Mechanism in Siamese Network for Visual Tracking
Yuan, Dongliang
Li, Qingdang
Yang, Xiaohui
Zhang, Mingyue
Sun, Zhen
APPLIED SCIENCES-BASEL, 2022, 12 (02):
[27] Object-Aware Anchor-Free Video Object Tracking using Attention Mechanism and Target Dynamics
Boroujeni, Ali Abbasi
Toroghi, Rahil Mahdian
Zareian, Hassan
32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 561 - 565
[28] Learning visual relationship and context-aware attention for image captioning
Wang, Junbo
Wang, Wei
Wang, Liang
Wang, Zhiyong
Feng, David Dagan
Tan, Tieniu
PATTERN RECOGNITION, 2020, 98
[29] Leveraging Linguistically-aware Object Relations and NASNet for Image Captioning
Sharif, Naeha
Jalwana, Mohammad A. A. K.
Bennamoun, Mohammed
Liu, Wei
Shah, Syed Afaq Ali
2020 35TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2020,
[30] A multi-view projection-based object-aware graph network for dense captioning of point clouds☆
Ma, Zijing
Yang, Zhi
Mao, Aihua
Wen, Shuyi
Yi, Ran
Liu, Yongjin
COMPUTERS & GRAPHICS-UK, 2025, 126

← 1 2 3 4 5 →