Object-aware semantics of attention for image captioning

被引：11

作者：

Wang, Shiwei ^{[1
,3
]}

Lan, Long ^{[2
,3
]}

Zhang, Xiang ^{[2
,3
]}

Dong, Guohua ^{[1
,3
]}

Luo, Zhigang ^{[1
,3
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China

[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China

[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2020年 / 79卷 / 3-4期

基金：

中国国家自然科学基金;

关键词：

High-level semantic concepts; Semantic attention; Image captioning;

D O I：

10.1007/s11042-019-08209-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.

引用

页码：2013 / 2030

页数：18

共 50 条

[1] Object-aware semantics of attention for image captioning
Shiwei Wang
Long Lan
Xiang Zhang
Guohua Dong
Zhigang Luo
Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
[2] Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Zhang, Junchao
Peng, Yuxin
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8319 - 8328
[3] Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation
Zhang, Junchao
Peng, Yuxin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6209 - 6222
[4] Object-aware Image Compression with Adversarial Learning
Du, Yunfei
Zhao, Nan
Duan, Yiping
Han, Chaoyi
2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,
[5] Object-Aware Attention in Few-Shot Learning
Shen, Yeqing
Mo, Lisha
Ma, Huimin
Hu, Tianyu
Dong, Yuhan
IMAGE AND GRAPHICS TECHNOLOGIES AND APPLICATIONS, IGTA 2021, 2021, 1480 : 95 - 108
[6] Object-aware Deep Network for Commodity Image Retrieval
Fang, Zhiwei
Liu, Jing
Wang, Yuhang
Li, Yong
Song, Hang
Tang, Jinhui
Lui, Hanqing
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 405 - 408
[7] Object Relation Attention for Image Paragraph Captioning
Yang, Li-Chuan
Yang, Chih-Yuan
Hsu, Jane Yung-jen
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3136 - 3144
[8] Object-Aware Tracking
Bogun, Ivan
Ribeiro, Eraldo
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1695 - 1700
[9] Object-aware Identification of Microservices
Amiri, Mohammad Javad
2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 253 - 256
[10] Image Inpainting with Cascaded Modulation GAN and Object-Aware Training
Zheng, Haitian
Lin, Zhe
Lu, Jingwan
Cohen, Scott
Shechtman, Eli
Barnes, Connelly
Zhang, Jianming
Xu, Ning
Amirghodsi, Sohrab
Luo, Jiebo
COMPUTER VISION - ECCV 2022, PT XVI, 2022, 13676 : 277 - 296

← 1 2 3 4 5 →