Object-aware semantics of attention for image captioning

被引:11
|
作者
Wang, Shiwei [1 ,3 ]
Lan, Long [2 ,3 ]
Zhang, Xiang [2 ,3 ]
Dong, Guohua [1 ,3 ]
Luo, Zhigang [1 ,3 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Inst Quantum Informat, State Key Lab High Performance Comp, Changsha 410073, Peoples R China
[3] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
High-level semantic concepts; Semantic attention; Image captioning;
D O I
10.1007/s11042-019-08209-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In image captioning, exploring the advanced semantic concepts is very important for boosting captioning performance. Although much progress has been made in this regard, most existing image captioning models usually neglect the interrelationships between objects in an image, which is a key factor of accurately understanding image content. In this paper, we propose the object-aware semantic attention object-aware semantic attention (OSA) based captioning model to address this issue. Specifically, our attention model allows the explicit associations between the objects by coupling the attention mechanism with three types of semantic concepts, i.e., the category information, relative sizes of the objects, and relative distances between objects. In reality, they are easily built up and seamlessly coupled with the well-known encoder-decoder captioning framework. In our empirical analysis, these semantic concepts favor different aspects of the image content like the number of the objects belonging to each category, the main focus of an image, and the closeness between the objects. Importantly, they are cooperated with visual features to help the attention model effectively highlight the image regions of interest for significant performance gains. By leveraging three types of semantic concepts, we derive four semantic attention models for image captioning. Extensive experiments on MSCOCO dataset show our attention models within the encoder-decoder image captioning framework perform favorably as compared to representative captioning models.
引用
收藏
页码:2013 / 2030
页数:18
相关论文
共 50 条
  • [31] Object-Aware Guidance for Autonomous Scene Reconstruction
    Liu, Ligang
    Xia, Xi
    Sun, Han
    Shen, Qi
    Xu, Juzhan
    Chen, Bin
    Huang, Hui
    Xu, Kai
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [32] Image editing by object-aware optimal boundary searching and mixed-domain composition
    Ge S.
    Jin X.
    Ye Q.
    Luo Z.
    Li Q.
    Ge, Shiming (geshiming@iie.ac.cn), 2018, Tsinghua University Press (04): : 71 - 82
  • [33] Image captioning model using attention and object features to mimic human image understanding
    Al-Malla, Muhammad Abdelhadie
    Jafar, Assef
    Ghneim, Nada
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [34] Image captioning model using attention and object features to mimic human image understanding
    Muhammad Abdelhadie Al-Malla
    Assef Jafar
    Nada Ghneim
    Journal of Big Data, 9
  • [35] SFNet: Learning Object-aware Semantic Correspondence
    Lee, Junghyup
    Kim, Dohyung
    Ponce, Jean
    Ham, Bumsub
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2273 - 2282
  • [36] Object-Aware Dictionary Learning with Deep Features
    Xie, Yurui
    Porikli, Fatih
    He, Xuming
    COMPUTER VISION - ACCV 2016, PT II, 2017, 10112 : 237 - 253
  • [37] Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
    Sogi, Naoya
    Shibata, Takashi
    Terao, Makoto
    COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 447 - 464
  • [38] Object-Aware NIR-to-Visible Translation
    Gao, Yunyi
    Gu, Lin
    Liu, Qiankun
    Fu, Ying
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 93 - 109
  • [39] Object-Aware Instance Labeling forWeakly Supervised Object Detection
    Kosugi, Satoshi
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6063 - 6071
  • [40] Token-word mixer meets object-aware transformer for referring image segmentation
    Zhang, Zhenliang
    Teng, Zhu
    Fan, Jack
    Zhang, Baopeng
    Fan, Jianping
    PATTERN RECOGNITION, 2024, 155