Variational joint self-attention for image captioning

被引:3
|
作者
Shao, Xiangjun [1 ]
Xiang, Zhenglong [2 ,3 ]
Li, Yuanxiang [1 ]
Zhang, Mingjie [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Software, Nanjing, Peoples R China
[3] Minnan Normal Univ, Key Lab Intelligent Optimizat & Informat Proc, Zhangzhou, Peoples R China
关键词
Semantics;
D O I
10.1049/ipr2.12470
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The image captioning task has attracted great attention from many researchers, and significant progress has been made in the past few years. Existing image captioning models, which mainly apply attention-based encoder-decoder architecture, achieve great developments image captioning. These attention-based models, however, are limited in the caption generation due to the potential errors resulting from the inaccurate detection of objects and incorrect attention to the objects. To alleviate the limitation, a Variational Joint Self-Attention model (VJSA) is proposed to learn a latent semantic alignment between the given image and its label description for guiding better image captioning. Unlike the existing image captioning models, VJSA first uses a self-attention module to encode the effective relationship information of intra-sequence and inter-sequences relationships. And then the variational neural inference module learns a distribution over the latent semantic alignment between the image and its corresponding description. In the decoding, the learned semantic alignment guides the decoder to generate the higher quality image caption. The results of the experiments reveal that the VJSA outperforms the compared models, and the performances of various metrics show that the proposed model is effective and feasible in image caption generation.
引用
收藏
页码:2075 / 2086
页数:12
相关论文
共 50 条
  • [21] Joint Scence Network and Attention-Guided for Image Captioning
    Zhou, Dongming
    Yang, Jing
    Zhang, Canlong
    Tang, Yanping
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1535 - 1540
  • [22] Sparse self-attention transformer for image inpainting
    Huang, Wenli
    Deng, Ye
    Hui, Siqi
    Wu, Yang
    Zhou, Sanping
    Wang, Jinjun
    PATTERN RECOGNITION, 2024, 145
  • [23] Context-Aware Group Captioning via Self-Attention and Contrastive Features
    Li, Zhuowan
    Tran, Quan
    Mai, Long
    Lin, Zhe
    Yuille, Alan L.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3437 - 3447
  • [24] HIGSA: Human image generation with self-attention
    Wu, Haoran
    He, Fazhi
    Si, Tongzhen
    Duan, Yansong
    Yan, Xiaohu
    ADVANCED ENGINEERING INFORMATICS, 2023, 55
  • [25] Improving Rumor Detection by Image Captioning and Multi-Cell Bi-RNN With Self-Attention in Social Networks
    Wang, Jenq-Haur
    Huang, Chin-Wei
    Norouzi, Mehdi
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2022, 18 (01) : 1 - 17
  • [26] Unsupervised Image-to-Image Translation with Self-Attention Networks
    Kang, Taewon
    Lee, Kwang Hee
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 102 - 108
  • [27] Joint self-attention for denoising Monte Carlo rendering
    Oh, Geunwoo
    Moon, Bochang
    VISUAL COMPUTER, 2024, 40 (07): : 4623 - 4634
  • [28] Spatial self-attention network with self-attention distillation for fine-grained image recognitionx2729;
    Baffour, Adu Asare
    Qin, Zhen
    Wang, Yong
    Qin, Zhiguang
    Choo, Kim-Kwang Raymond
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 81
  • [29] Self-Attention Underwater Image Enhancement by Data Augmentation
    Gao, Yu
    Luo, Huifu
    Zhu, Wei
    Ma, Feng
    Zhao, Jiang
    Qin, Kailin
    PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 991 - 995
  • [30] NATURAL IMAGE MATTING WITH SHIFTED WINDOW SELF-ATTENTION
    Wang, Zhikun
    Liu, Yang
    Li, Zonglin
    Wang, Chenyang
    Zhang, Shengping
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2911 - 2915