Unbinding tensor product representations for image captioning with semantic alignment and complementation

被引:0
|
作者
Wu, Bicheng [1 ]
Wo, Yan [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
关键词
Image captioning; Tensor product representations; Semantic content; Intermediate representations;
D O I
10.1007/s00530-024-01309-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning, which describes an image with natural language, is an important but challenging multi-modal task. Many state-of-the-art methods generally adopt the encoder-decoder framework to implement information conversion from image modality to text modality. However, most methods are limited by the local view during encoding and lack consideration of word organization logic during decoding, prone to generating captions that are patchworks of the salient visual content and relying on high-frequency expression templates subject to the dataset bias. To alleviate the phenomenon, we propose a novel encoding-decoding-based image captioning method, unbinding tensor product representations for image captioning with semantic alignment and complementation (uTPR-SAC). uTPR-SAC acquires the semantic content reflecting the global cognition of the images through semantic alignment based on the common subspace projection. The structural information of visual features are complemented by guidance of semantic content, which helps to generate the intermediate representations with the deep semantic understanding. To avoid the dependence on high-frequency templates, the unbinding operation of TPR optimizes the word prediction by reasoning word structures with both an orthogonal structure matrix and visual structure information of the intermediate representations. Comparison with other state-of-the-art methods at MSCOCO validates the competitiveness and effectiveness of uTPR-SAC, where it, respectively, reaches 81.0, 65.9, 51.7, 39.8 and 59.4 on BLEU-1, 2, 3, 4, and ROUGE-L. Extensive visualization experiments not only show the sensitivity of semantic content to important visual content, but also demonstrate the validity of the word structures obtained by unbinding, both of which contribute to the semantic accuracy of the generated captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Hadamard Product Perceptron Attention for Image Captioning
    Jiang, Weitao
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 2707 - 2724
  • [32] Contrastive semantic similarity learning for image captioning evaluation
    Zeng, Chao
    Kwong, Sam
    Zhao, Tiesong
    Wang, Hanli
    INFORMATION SCIENCES, 2022, 609 : 913 - 930
  • [33] Image Captioning With Visual-Semantic Double Attention
    He, Chen
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [34] Semantic-Conditional Diffusion Networks for Image Captioning
    Luo, Jianjie
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Feng, Jianlin
    Chao, Hongyang
    Mei, Tao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23359 - 23368
  • [35] Semantic-Guided Selective Representation for Image Captioning
    Li, Yinan
    Ma, Yiwei
    Zhou, Yiyi
    Yu, Xiao
    IEEE ACCESS, 2023, 11 : 14500 - 14510
  • [36] SPATIAL-SEMANTIC ATTENTION FOR GROUNDED IMAGE CAPTIONING
    Hu, Wenzhe
    Wang, Lanxiao
    Xu, Linfeng
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 61 - 65
  • [37] Variational Structured Semantic Inference for Diverse Image Captioning
    Chen, Fuhai
    Ji, Rongrong
    Ji, Jiayi
    Sun, Xiaoshuai
    Zhang, Baochang
    Ge, Xuri
    Wu, Yongjian
    Huang, Feiyue
    Wang, Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [38] Improved Image Captioning via Semantic Feature Update
    Tian, Peng
    Mo, Hongwei
    Jiang, Laihao
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7938 - 7943
  • [39] Structural Semantic Adversarial Active Learning for Image Captioning
    Zhang, Beichen
    Li, Liang
    Su, Li
    Wang, Shuhui
    Deng, Jincan
    Zha, Zheng-Jun
    Huang, Qingming
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1112 - 1121
  • [40] Adaptive Semantic-Enhanced Transformer for Image Captioning
    Zhang, Jing
    Fang, Zhongjun
    Sun, Han
    Wang, Zhe
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1785 - 1796