Object-aware semantics of attention for image captioning

被引:11
|
作者
Wang, Shiwei [1 ,3 ]
Lan, Long [2 ,3 ]
Zhang, Xiang [2 ,3 ]
Dong, Guohua [1 ,3 ]
Luo, Zhigang [1 ,3 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
High-level semantic concepts; Semantic attention; Image captioning;
D O I
10.1007/s11042-019-08209-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.
引用
收藏
页码:2013 / 2030
页数:18
相关论文
共 50 条
  • [41] An Object-Aware Hardware Transactional Memory System
    Khan, Behram
    Horsnell, Matthew
    Rogers, Ian
    Lujan, Mikel
    Dinn, Andrew
    Watson, Ian
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 93 - 102
  • [42] Scalable Object-Aware Hardware Transactional Memory
    Khan, Behram
    Horsnell, Matthew
    Lujan, Mikel
    Watson, Ian
    EURO-PAR 2010 PARALLEL PROCESSING, PT I, 2010, 6271 : 268 - 279
  • [43] OBJECT-AWARE SALIENCY DETECTION FOR CONSUMER IMAGES
    Tang, Hao
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1097 - 1100
  • [44] Image Captioning With Positional and Geometrical Semantics
    Ul Haque, Anwar
    Ghani, Sayeed
    Saeed, Muhammad
    IEEE ACCESS, 2021, 9 : 160917 - 160925
  • [45] Comprehending and Ordering Semantics for Image Captioning
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Mei, Tao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17969 - 17978
  • [46] Object-Aware Image Augmentation for Audio-Visual Zero-Shot Learning
    Dong, Yujie
    Chen, Shiming
    Duan, Bowen
    Ding, Weiping
    Wang, Yisong
    You, Xinge
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [47] Object-aware Policy Network in Deep Recommender Systems
    Zhou, Guoqiang
    Xu, Zhangxian
    Lin, Jiayin
    Bao, Shudi
    Zhou, Liliang
    Shen, Jun
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2023, 95 (2-3): : 271 - 280
  • [48] Image editing by object-aware optimal boundary searching and mixed-domain composition
    Shiming Ge
    Xin Jin
    Qiting Ye
    Zhao Luo
    Qiang Li
    Computational Visual Media, 2018, 4 (01) : 71 - 82
  • [49] Normalized and Geometry-Aware Self-Attention Network for Image Captioning
    Guo, Longteng
    Liu, Jing
    Zhu, Xinxin
    Yao, Peng
    Lu, Shichen
    Lu, Hanqing
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10324 - 10333
  • [50] Object Hallucination in Image Captioning
    Rohrbach, Anna
    Hendricks, Lisa Anne
    Burns, Kaylee
    Darrell, Trevor
    Saenko, Kate
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4035 - 4045