Object-aware semantics of attention for image captioning

被引：11

作者：

Wang, Shiwei ^{[1
,3
]}

Lan, Long ^{[2
,3
]}

Zhang, Xiang ^{[2
,3
]}

Dong, Guohua ^{[1
,3
]}

Luo, Zhigang ^{[1
,3
]}

机构：

[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China

[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China

[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2020年 / 79卷 / 3-4期

基金：

中国国家自然科学基金;

关键词：

High-level semantic concepts; Semantic attention; Image captioning;

D O I：

10.1007/s11042-019-08209-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.

引用

页码：2013 / 2030

页数：18

共 50 条

[41] An Object-Aware Hardware Transactional Memory System
Khan, Behram
Horsnell, Matthew
Rogers, Ian
Lujan, Mikel
Dinn, Andrew
Watson, Ian
HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 93 - 102
[42] Scalable Object-Aware Hardware Transactional Memory
Khan, Behram
Horsnell, Matthew
Lujan, Mikel
Watson, Ian
EURO-PAR 2010 PARALLEL PROCESSING, PT I, 2010, 6271 : 268 - 279
[43] OBJECT-AWARE SALIENCY DETECTION FOR CONSUMER IMAGES
Tang, Hao
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1097 - 1100
[44] Image Captioning With Positional and Geometrical Semantics
Ul Haque, Anwar
Ghani, Sayeed
Saeed, Muhammad
IEEE ACCESS, 2021, 9 : 160917 - 160925
[45] Comprehending and Ordering Semantics for Image Captioning
Li, Yehao
Pan, Yingwei
Yao, Ting
Mei, Tao
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17969 - 17978
[46] Object-Aware Image Augmentation for Audio-Visual Zero-Shot Learning
Dong, Yujie
Chen, Shiming
Duan, Bowen
Ding, Weiping
Wang, Yisong
You, Xinge
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[47] Object-aware Policy Network in Deep Recommender Systems
Zhou, Guoqiang
Xu, Zhangxian
Lin, Jiayin
Bao, Shudi
Zhou, Liliang
Shen, Jun
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2023, 95 (2-3): : 271 - 280
[48] Image editing by object-aware optimal boundary searching and mixed-domain composition
Shiming Ge
Xin Jin
Qiting Ye
Zhao Luo
Qiang Li
Computational Visual Media, 2018, 4 (01) : 71 - 82
[49] Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Guo, Longteng
Liu, Jing
Zhu, Xinxin
Yao, Peng
Lu, Shichen
Lu, Hanqing
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10324 - 10333
[50] Object Hallucination in Image Captioning
Rohrbach, Anna
Hendricks, Lisa Anne
Burns, Kaylee
Darrell, Trevor
Saenko, Kate
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045

← 1 2 3 4 5 →