Unbinding tensor product representations for image captioning with semantic alignment and complementation

被引:0
|
作者
Wu, Bicheng [1 ]
Wo, Yan [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510641, Peoples R China
关键词
Image captioning; Tensor product representations; Semantic content; Intermediate representations;
D O I
10.1007/s00530-024-01309-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning, which describes an image with natural language, is an important but challenging multi-modal task. Many state-of-the-art methods generally adopt the encoder-decoder framework to implement information conversion from image modality to text modality. However, most methods are limited by the local view during encoding and lack consideration of word organization logic during decoding, prone to generating captions that are patchworks of the salient visual content and relying on high-frequency expression templates subject to the dataset bias. To alleviate the phenomenon, we propose a novel encoding-decoding-based image captioning method, unbinding tensor product representations for image captioning with semantic alignment and complementation (uTPR-SAC). uTPR-SAC acquires the semantic content reflecting the global cognition of the images through semantic alignment based on the common subspace projection. The structural information of visual features are complemented by guidance of semantic content, which helps to generate the intermediate representations with the deep semantic understanding. To avoid the dependence on high-frequency templates, the unbinding operation of TPR optimizes the word prediction by reasoning word structures with both an orthogonal structure matrix and visual structure information of the intermediate representations. Comparison with other state-of-the-art methods at MSCOCO validates the competitiveness and effectiveness of uTPR-SAC, where it, respectively, reaches 81.0, 65.9, 51.7, 39.8 and 59.4 on BLEU-1, 2, 3, 4, and ROUGE-L. Extensive visualization experiments not only show the sensitivity of semantic content to important visual content, but also demonstrate the validity of the word structures obtained by unbinding, both of which contribute to the semantic accuracy of the generated captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] StructCap: Structured Semantic Embedding for Image Captioning
    Chen, Fuhai
    Ji, Rongrong
    Su, Jinsong
    Wu, Yongjian
    Wu, Yunsheng
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 46 - 54
  • [22] Image Captioning Based on Visual and Semantic Attention
    Wei, Haiyang
    Li, Zhixin
    Zhang, Canlong
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 151 - 162
  • [23] Integrating Scene Semantic Knowledge into Image Captioning
    Wei, Haiyang
    Li, Zhixin
    Huang, Feicheng
    Zhang, Canlong
    Ma, Huifang
    Shi, Zhongzhi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (02)
  • [24] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)
  • [25] Semantic interdisciplinary evaluation of image captioning models
    Sirisha, Uddagiri
    Chandana, Bolem Sai
    COGENT ENGINEERING, 2022, 9 (01):
  • [26] A novel image captioning model with visual-semantic similarities and visual representations re-weighting
    Thobhani, Alaa
    Zou, Beiji
    Kui, Xiaoyan
    Al-Shargabi, Asma A.
    Derea, Zaid
    Abdussalam, Amr
    Asham, Mohammed A.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [27] Graph Alignment Transformer for More Grounded Image Captioning
    Tian, Canwei
    Hu, Haiyang
    Li, Zhongjin
    2022 INTERNATIONAL CONFERENCE ON INDUSTRIAL IOT, BIG DATA AND SUPPLY CHAIN, IIOTBDSC, 2022, : 95 - 102
  • [28] A tensor product of representations of Cuntz algebras
    Kawamura, Katsunori
    LETTERS IN MATHEMATICAL PHYSICS, 2007, 82 (01) : 91 - 104
  • [29] Hadamard Product Perceptron Attention for Image Captioning
    Weitao Jiang
    Haifeng Hu
    Neural Processing Letters, 2023, 55 : 2707 - 2724
  • [30] A Tensor Product of Representations of Cuntz Algebras
    Katsunori Kawamura
    Letters in Mathematical Physics, 2007, 82 : 91 - 104