Dynamic-balanced double-attention fusion for image captioning

被引:0
|
作者
Wang, Changzhi [1 ]
Gu, Xiaodong [1 ]
机构
[1] Fudan Univ, Dept Elect Engn, Shanghai 200433, Peoples R China
关键词
Image captioning; Attention fusion; DSR; Attention variance;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning has received significant attention in the cross-modal field in which spatial and channel attentions play a crucial role. However, such attention-based approaches ignore two issues: (1) errors or noise in the channel feature map amplifies in the spatial feature map, leading to a lower model reliability; (2) image spatial feature and channel feature provide different contributions to the prediction both function words (e.g., "in", "out"and "on") and notional words (e.g., "girl", "teddy"and "bear"). To alleviate the above issues, in this paper we propose the Dynamic-Balanced Double-Attention Fusion (DBDAF) for image captioning task that novelly exploits the attention variation and enhances the overall performance of the model. Technically, DBDAF first integrates a parallel Double Attention Network (DAN) in which channel attention is capitalized on as a supplement to the region attention, enhancing the model reliability. Then, a attention variation based Balancing Attention Fusion Mechanism (BAFM) module is devised. When predicting function words and notional words, BAFM makes a dynamic balance between channel attention and region attention based on attention variation. Moreover, to achieve the richer image description, we further devise a Doubly Stochastic Regularization (DSR) penalty and integrate it into the model loss function. Such DSR makes the model equally focus on every pixel and every channel in generating entire sentence. Extensive experiments on the three typical datasets show our DBDAF outperforms the related end-to-end leading approaches clearly. More remarkably, DBDAF achieves 1.04% and 1.75% improvement in terms of BLEU4 and CIDEr on the MSCOCO datasets.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Deliberate Attention Networks for Image Captioning
    Gao, Lianli
    Fan, Kaixuan
    Song, Jingkuan
    Liu, Xianglong
    Xu, Xing
    Shen, Heng Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8320 - 8327
  • [22] Gated Hierarchical Attention for Image Captioning
    Wang, Qingzhong
    Chan, Antoni B.
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 21 - 37
  • [23] Delving into Precise Attention in Image Captioning
    Hu, Shaohan
    Huang, Shenglei
    Wang, Guolong
    Li, Zhipeng
    Qin, Zheng
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 74 - 82
  • [24] Multivariate Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    COMPUTER VISION - ACCV 2018, PT VI, 2019, 11366 : 587 - 602
  • [25] Distributed Attention for Grounded Image Captioning
    Chen, Nenglun
    Pan, Xingjia
    Chen, Runnan
    Yang, Lei
    Lin, Zhiwen
    Ren, Yuqiang
    Yuan, Haolei
    Guo, Xiaowei
    Huang, Feiyue
    Wang, Wenping
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1966 - 1975
  • [26] Feedback Attention Model for Image Captioning
    Lyu F.
    Hu F.
    Zhang Y.
    Xia Z.
    Sheng V.S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (07): : 1122 - 1129
  • [27] Attention Correctness in Neural Image Captioning
    Liu, Chenxi
    Mao, Junhua
    Sha, Fei
    Yuille, Alan
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4176 - 4182
  • [28] IMAGE CAPTIONING WITH WORD LEVEL ATTENTION
    Fang, Fang
    Wang, Hanli
    Tang, Pengjie
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1278 - 1282
  • [29] Hierarchical Attention Network for Image Captioning
    Wang, Weixuan
    Chen, Zhihong
    Hu, Haifeng
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8957 - 8964
  • [30] Hybrid attention network for image captioning
    Jiang, Wenhui
    Li, Qin
    Zhan, Kun
    Fang, Yuming
    Shen, Fei
    DISPLAYS, 2022, 73