Variational joint self-attention for image captioning

被引:3
|
作者
Shao, Xiangjun [1 ]
Xiang, Zhenglong [2 ,3 ]
Li, Yuanxiang [1 ]
Zhang, Mingjie [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing, Peoples R China
[3] Minnan Normal Univ, Key Lab Intelligent Optimizat & Informat Proc, Zhangzhou, Peoples R China
关键词
Semantics;
D O I
10.1049/ipr2.12470
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The image captioning task has attracted great attention from many researchers, and significant progress has been made in the past few years. Existing image captioning models, which mainly apply attention-based encoder-decoder architecture, achieve great developments image captioning. These attention-based models, however, are limited in the caption generation due to the potential errors resulting from the inaccurate detection of objects and incorrect attention to the objects. To alleviate the limitation, a Variational Joint Self-Attention model (VJSA) is proposed to learn a latent semantic alignment between the given image and its label description for guiding better image captioning. Unlike the existing image captioning models, VJSA first uses a self-attention module to encode the effective relationship information of intra-sequence and inter-sequences relationships. And then the variational neural inference module learns a distribution over the latent semantic alignment between the image and its corresponding description. In the decoding, the learned semantic alignment guides the decoder to generate the higher quality image caption. The results of the experiments reveal that the VJSA outperforms the compared models, and the performances of various metrics show that the proposed model is effective and feasible in image caption generation.
引用
收藏
页码:2075 / 2086
页数:12
相关论文
共 50 条
  • [1] Improve Image Captioning by Self-attention
    Li, Zhenru
    Li, Yaoyi
    Lu, Hongtao
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 91 - 98
  • [2] Relation constraint self-attention for image captioning
    Ji, Junzhong
    Wang, Mingzhan
    Zhang, Xiaodan
    Lei, Minglong
    Qu, Liangqiong
    NEUROCOMPUTING, 2022, 501 : 778 - 789
  • [3] A Dual Self-Attention based Network for Image Captioning
    Li, ZhiYong
    Yang, JinFu
    Li, YaPing
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1590 - 1595
  • [4] Transformer with sparse self-attention mechanism for image captioning
    Wang, Duofeng
    Hu, Haifeng
    Chen, Dihu
    ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
  • [5] Dual-stream Self-attention Network for Image Captioning
    Wan, Boyang
    Jiang, Wenhui
    Fang, Yuming
    Wen, Wenying
    Liu, Hantao
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [6] Normalized and Geometry-Aware Self-Attention Network for Image Captioning
    Guo, Longteng
    Liu, Jing
    Zhu, Xinxin
    Yao, Peng
    Lu, Shichen
    Lu, Hanqing
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10324 - 10333
  • [7] Joint Self-Attention for Remote Sensing Image Matching
    Li, Liangzhi
    Han, Ling
    Cao, Hongye
    Hu, Huijuan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [8] Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning
    Ji, Jiayi
    Huang, Xiaoyang
    Sun, Xiaoshuai
    Zhou, Yiyi
    Luo, Gen
    Cao, Liujuan
    Liu, Jianzhuang
    Shao, Ling
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3962 - 3974
  • [9] Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    Bennamoun, Mohammed
    2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 167 - 173
  • [10] Variational Self-attention Network for Sequential Recommendation
    Zhao, Jing
    Zhao, Pengpeng
    Zhao, Lei
    Liu, Yanchi
    Sheng, Victor S.
    Zhou, Xiaofang
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1559 - 1570