The Style Transformer with Common Knowledge Optimization for Image-Text Retrieval

被引:2
|
作者
Li W. [1 ]
Ma Z. [2 ]
Shi J. [3 ]
Fan X. [1 ]
机构
[1] Harbin Institute of Technology, Department of Computer Science and Technology, Harbin
[2] Peng Cheng Laboratory, Shenzhen
[3] Beijing University of Posts and Telecommunications, School of Cyberspace Security, Beijing
关键词
Image-text retrieval; transformer;
D O I
10.1109/LSP.2023.3310870
中图分类号
学科分类号
摘要
Image-text retrieval which associates different modalities has drawn broad attention due to its excellent research value and broad real-world application. However, most of the existing methods haven't taken the high-level semantic relationships ('style embedding') and common knowledge from multi-modalities into full consideration. To this end, we introduce a novel style transformer network with common knowledge optimization (CKSTN) for image-text retrieval. The main module is the common knowledge adaptor (CKA) with both the style embedding extractor (SEE) and the common knowledge optimization (CKO) modules. Specifically, the SEE uses the sequential update strategy to effectively connect the features of different stages in SEE. The CKO module is introduced to dynamically capture the latent concepts of common knowledge from different modalities. Besides, to get generalized temporal common knowledge, we propose a sequential update strategy to effectively integrate the features of different layers in SEE with previous common feature units. CKSTN demonstrates the superiorities of the state-of-the-art methods in image-text retrieval on MSCOCO and Flickr30 K datasets. Moreover, CKSTN is constructed based on the lightweight transformer which is more convenient and practical for the application of real scenes, due to the better performance and lower parameters. © 1994-2012 IEEE.
引用
收藏
页码:1197 / 1201
页数:4
相关论文
共 50 条
  • [21] Asymmetric bi-encoder for image-text retrieval
    Xiong, Wei
    Liu, Haoliang
    Mi, Siya
    Zhang, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
  • [22] Multiview adaptive attention pooling for image-text retrieval
    Ding, Yunlai
    Yu, Jiaao
    Lv, Qingxuan
    Zhao, Haoran
    Dong, Junyu
    Li, Yuezun
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [23] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860
  • [24] Joint Image-text Representation Learning for Fashion Retrieval
    Yan, Cairong
    Li, Yu
    Wan, Yongquan
    Zhang, Zhaohui
    ICMLC 2020: 2020 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2018, : 412 - 417
  • [25] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [26] COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval
    Wang, Yaodong
    Ji, Zhong
    Chen, Kexin
    Pang, Yanwei
    Zhang, Zhongfei
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 5959 - 5978
  • [27] COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval
    Yaodong Wang
    Zhong Ji
    Kexin Chen
    Yanwei Pang
    Zhongfei Zhang
    Neural Processing Letters, 2023, 55 : 5959 - 5978
  • [28] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
  • [29] Image-text Retrieval via Preserving Main Semantics of Vision
    Zhang, Xu
    Niu, Xinzheng
    Fournier-Viger, Philippe
    Dai, Xudong
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1967 - 1972
  • [30] TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval
    Xu, Xing
    Sun, Jialiang
    Cao, Zuo
    Zhang, Yin
    Zhu, Xiaofeng
    Shen, Heng Tao
    INFORMATION FUSION, 2023, 91 : 327 - 337