Unbinding tensor product representations for image captioning with semantic alignment and complementation

被引:0
|
作者
Wu, Bicheng [1 ]
Wo, Yan [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
关键词
Image captioning; Tensor product representations; Semantic content; Intermediate representations;
D O I
10.1007/s00530-024-01309-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning, which describes an image with natural language, is an important but challenging multi-modal task. Many state-of-the-art methods generally adopt the encoder-decoder framework to implement information conversion from image modality to text modality. However, most methods are limited by the local view during encoding and lack consideration of word organization logic during decoding, prone to generating captions that are patchworks of the salient visual content and relying on high-frequency expression templates subject to the dataset bias. To alleviate the phenomenon, we propose a novel encoding-decoding-based image captioning method, unbinding tensor product representations for image captioning with semantic alignment and complementation (uTPR-SAC). uTPR-SAC acquires the semantic content reflecting the global cognition of the images through semantic alignment based on the common subspace projection. The structural information of visual features are complemented by guidance of semantic content, which helps to generate the intermediate representations with the deep semantic understanding. To avoid the dependence on high-frequency templates, the unbinding operation of TPR optimizes the word prediction by reasoning word structures with both an orthogonal structure matrix and visual structure information of the intermediate representations. Comparison with other state-of-the-art methods at MSCOCO validates the competitiveness and effectiveness of uTPR-SAC, where it, respectively, reaches 81.0, 65.9, 51.7, 39.8 and 59.4 on BLEU-1, 2, 3, 4, and ROUGE-L. Extensive visualization experiments not only show the sensitivity of semantic content to important visual content, but also demonstrate the validity of the word structures obtained by unbinding, both of which contribute to the semantic accuracy of the generated captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Semantic Tensor Product for Image Captioning
    Sur, Chiranjib
    Liu, Pei
    Zhou, Yingjie
    Wu, Dapeng
    5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 33 - 37
  • [2] Cascade Semantic Prompt Alignment Network for Image Captioning
    Li, Jingyu
    Zhang, Lei
    Zhang, Kun
    Hu, Bo
    Xie, Hongtao
    Mao, Zhendong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5266 - 5281
  • [3] Semantic Representations With Attention Networks for Boosting Image Captioning
    Hafeth, Deema Abdal
    Kollias, Stefanos
    Ghafoor, Mubeen
    IEEE ACCESS, 2023, 11 : 40230 - 40239
  • [4] Tensor factorization via transformed tensor-tensor product for image alignment
    Sijia Xia
    Duo Qiu
    Xiongjun Zhang
    Numerical Algorithms, 2024, 95 : 1251 - 1289
  • [5] Tensor factorization via transformed tensor-tensor product for image alignment
    Xia, Sijia
    Qiu, Duo
    Zhang, Xiongjun
    NUMERICAL ALGORITHMS, 2024, 95 (03) : 1251 - 1289
  • [6] Image Captioning with Semantic Attention
    You, Quanzeng
    Jin, Hailin
    Wang, Zhaowen
    Fang, Chen
    Luo, Jiebo
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4651 - 4659
  • [7] TENSOR PRODUCT REPRESENTATIONS
    ROBINSON, GD
    JOURNAL OF ALGEBRA, 1972, 20 (01) : 118 - &
  • [8] Protein image alignment via tensor product cubic splines
    Potra, F. A.
    Liu, X.
    OPTIMIZATION METHODS & SOFTWARE, 2007, 22 (01): : 155 - 168
  • [9] Tensor product representations
    Yousofzadeh, Malihe
    JOURNAL OF ALGEBRA, 2022, 606 : 19 - 29
  • [10] Object semantic analysis for image captioning
    Sen Du
    Hong Zhu
    Guangfeng Lin
    Dong Wang
    Jing Shi
    Jing Wang
    Multimedia Tools and Applications, 2023, 82 : 43179 - 43206