Unbinding tensor product representations for image captioning with semantic alignment and complementation

被引:0
|
作者
Wu, Bicheng [1 ]
Wo, Yan [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
关键词
Image captioning; Tensor product representations; Semantic content; Intermediate representations;
D O I
10.1007/s00530-024-01309-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning, which describes an image with natural language, is an important but challenging multi-modal task. Many state-of-the-art methods generally adopt the encoder-decoder framework to implement information conversion from image modality to text modality. However, most methods are limited by the local view during encoding and lack consideration of word organization logic during decoding, prone to generating captions that are patchworks of the salient visual content and relying on high-frequency expression templates subject to the dataset bias. To alleviate the phenomenon, we propose a novel encoding-decoding-based image captioning method, unbinding tensor product representations for image captioning with semantic alignment and complementation (uTPR-SAC). uTPR-SAC acquires the semantic content reflecting the global cognition of the images through semantic alignment based on the common subspace projection. The structural information of visual features are complemented by guidance of semantic content, which helps to generate the intermediate representations with the deep semantic understanding. To avoid the dependence on high-frequency templates, the unbinding operation of TPR optimizes the word prediction by reasoning word structures with both an orthogonal structure matrix and visual structure information of the intermediate representations. Comparison with other state-of-the-art methods at MSCOCO validates the competitiveness and effectiveness of uTPR-SAC, where it, respectively, reaches 81.0, 65.9, 51.7, 39.8 and 59.4 on BLEU-1, 2, 3, 4, and ROUGE-L. Extensive visualization experiments not only show the sensitivity of semantic content to important visual content, but also demonstrate the validity of the word structures obtained by unbinding, both of which contribute to the semantic accuracy of the generated captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Weakly supervised grounded image captioning with semantic matching
    Du, Sen
    Zhu, Hong
    Lin, Guangfeng
    Liu, Yuanyuan
    Wang, Dong
    Shi, Jing
    Wu, Zhong
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4300 - 4318
  • [42] Aligned visual semantic scene graph for image captioning
    Zhao, Shanshan
    Li, Lixiang
    Peng, Haipeng
    DISPLAYS, 2022, 74
  • [43] Weakly supervised grounded image captioning with semantic matching
    Sen Du
    Hong Zhu
    Guangfeng Lin
    Yuanyuan Liu
    Dong Wang
    Jing Shi
    Zhong Wu
    Applied Intelligence, 2024, 54 : 4300 - 4318
  • [44] Center-enhanced video captioning model with multimodal semantic alignment
    Zhang, Benhui
    Gao, Junyu
    Yuan, Yuan
    NEURAL NETWORKS, 2024, 180
  • [45] Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning
    Shi, Botian
    Ji, Lei
    Niu, Zhendong
    Duan, Nan
    Zhou, Ming
    Chen, Xilin
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4337 - 4345
  • [46] Semantic Object Alignment and Region-Aware Learning for Change Captioning
    Tian, Weidong
    Ren, Quan
    Zhao, Zhongqiu
    Tian, Ruihua
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [47] Data-efficient image captioning of fine art paintings via virtual-real semantic alignment training
    Lu, Yue
    Guo, Chao
    Dai, Xingyuan
    Wang, Fei-Yue
    NEUROCOMPUTING, 2022, 490 : 163 - 180
  • [48] Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
    Yang, Cong
    Li, Zuchao
    Zhang, Lefei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [49] Tensor product representations for orthosymplectic Lie superalgebras
    Benkart, G
    Shader, CYL
    Ram, A
    JOURNAL OF PURE AND APPLIED ALGEBRA, 1998, 130 (01) : 1 - 48
  • [50] DECOMPOSITION OF TENSOR PRODUCT OF REPRESENTATIONS OF SYMMETRIC GROUP
    SCHINDLER, S
    MIRMAN, R
    JOURNAL OF MATHEMATICAL PHYSICS, 1977, 18 (08) : 1678 - 1696